Applied Analysis 
P. Ouwehand 



Department of Mathematical Sciences 
Stellenbosch University 



Contents 



1 Metric Spaces, Normed Spaces and Inner Product Spaces 1 

1.1 The Geometry of M*" 1 

1.2 Convergence and Continuity in Metric Spaces 3 

1.3 Normed Spaces 6 

1.4 Inner Product Spaces 7 

1.5 Linear Operators 12 

1.6 Projections in Hilbert Spaces 13 

2 Basic Notions of Topology 17 

2.1 Countable and Uncountable Sets 17 

2.2 Open Sets and the Interior Operation 21 

2.3 Closed Sets and the Closure Operation 24 

2.4 Compact Spaces and Sets 27 

2.5 Compactness in M" 31 

2.6 Convergence and Continuity 32 

2.6.1 Pull-backs and Push-forwards 32 

2.6.2 Topological Characterizations of Convergence and Continuity 34 

2.6.3 Continuity and Compactness 35 

2.7 Separable Spaces 37 

2.8 The Banach Space C[a,6] 37 

2.8.1 Uniform Convergence 37 

2.8.2 Compactness: Arzela-Ascoli Theorem* 41 

2.8.3 Separability: Stone-Weierstrass Theorem* 42 

3 Motivation for Measure Theory 47 

3.1 What is "Area"? 47 

3.2 Shortcomings of the Riemann Integral 48 

3.3 Motivation from Probability Theory 51 

3.4 Structure of Events 52 

4 Measure Spaces 55 

4.1 Events and cr-algebras 55 

4.2 Measures 57 

4.3 Continuity Properties of Measures 60 

4.3.1 Limit Operations on Sets 60 

4.3.2 Limits of Sets and Measures 61 

i 



ii Contents 



4.4 Lcbcsguc Measure from Coin Tossing 62 

4.5 Some Probability Theory 66 

4.6 Extension of Measures* 70 

4.6.1 Other Families of Sets* 70 

4.6.2 The Extension Theorem* 71 

4.6.3 Completion of Measure Spaces* 75 

4.7 Lebesgue Measure 76 

4.7.1 Lebesgue measure on M 76 

4.7.2 Lebesgue Measure on M*^ 79 

5 Mesisurable Functions and Random Vsiriables 81 

5.1 Definition of Measurable Function 81 

5.2 Combinations of Measurable Functions 84 

5.3 Measures and a-algebras from Measurable Functions 88 

5.4 Information 90 

6 Integration 93 

6.1 Definition and Basic Properties 93 

6.2 Lebesgue's Dominated Convergence Theorem 98 

6.3 Measure Zero 100 

6.4 Chain Rule, Change of Variables ,. 103 

6.5 Riemann Integral vs. Lebesgue Integral 105 

7 Differentiation 107 

7.1 Bounded Linear Operators 107 

7.2 The Derivative 108 

7.2.1 Definition of the Derivative 108 

7.2.2 The Chain Rule 112 

7.2.3 Components ,. 114 

7.2.4 Partial Derivatives and the Jacobian Matrix 116 

7.2.5 A Sufficient Condition for the Existence of -D/(x) 118 

7.2.6 The Chain Rule: Reprise 120 

7.2.7 Further Manipulation Rules 121 

7.2.8 Directional Derivatives 122 

7.3 Taylor Theorems 124 

7.3.1 Taylor's Theorem for One Variable 124 

7.3.2 A Higher-Order Taylor Theorem 127 

7.4 Maxima and Minima 133 

7.4.1 Topological Facts about Extrema 133 

7.4.2 Maxima and Minima via Calculus 134 

7.4.3 Linear Least Squares 139 

8 Products and Independence 141 

8.1 The Monotone Class Theorem 141 

8.2 Products 144 

8.2.1 Introduction 144 

8.2.2 Products of Measure Spaces 145 



Contents iii 



8.3 Independence 148 

9 The Spaces and Fourier Analysis 153 

9.1 CP Spaces 153 

9.1.1 Integration of complex-valued functions 153 

9.1.2 Definition of >C^'-spaces 154 

9.1.3 £i and £2 I55 

9.1.4 General Theory of Z^^'-spaces* 160 

9.2 Geometry of Hilbert Space and Generalized Fourier Series 164 

9.2.1 Projections in Hilbert Spaces 164 

9.2.2 Orthonormal Bases 167 

9.3 Fourier Scries 172 

9.3.1 Statement of Results 172 

9.3.2 Examples 174 

9.3.3 Proofs* 175 

10 Weak Convergence and Characteristic Functions 181 

10.1 Weak Convergence and Convergence in Distribution 181 

10.2 Characteristic Functions 187 

10.2.1 Basic Properties 187 

10.2.2 Inversion 191 

10.2.3 Weak Convergence and Characteristic Functions 196 

10.3 The Central Limit Theorem 198 

11 Conditional Expectation and Martingales 201 

11.1 Information and Expectation 201 

11.1.1 Conditioning on an Event 201 

11.1.2 Conditioning on a Random Variable 203 

11.1.3 Conditioning on a u-Algebra 206 

11.2 Theory of Martingales in Discrete Time 212 

11.2.1 Stochastic Processes and Filtrations 212 

11.2.2 Martingales, Submartingales, Supermartingales 213 

11.2.3 Games and Strategies 216 

12 PDEs in Finance, with a Detour Through Black-Scholes 221 

12.1 Modelling Stock Prices 221 

12.1.1 ModelUng Returns in Continuous-Time 222 

12.1.2 Modelling Share Prices in Continuous Time 225 

12.2 A Naive Approach to Stochastic Calculus 226 

12.3 The Black-Scholes Model 229 

12.3.1 The Black-Scholes PDF 229 

12.3.2 Pricing in the Risk-Neutral World 230 

12.3.3 The Distribution of Asset Prices 232 

12.4 Option Pricing: The Black-Scholes Formula 236 



iv Contents 



13 Introduction to PDEs 241 

13.1 What is a PDE? 241 

13.1.1 Types of PDEs 242 

13.2 Solutions to a PDE 242 

13.2.1 Contrast with ODEs 242 

13.2.2 First-Order Linear PDEs 243 

13.2.3 Initial- and Boundary Conditions 247 

13.2.4 Well-Posed Problems 250 

13.3 Classification of Linear Second-Order PDEs 251 

13.4 Characteristics for 2nd-0rder Linear PDEs 256 

14 Laplace's Equation 263 

14.1 The Divergence Theorem and Related Results 263 

14.2 Harmonic Functions 265 

14.2.1 Some Heuristic Remarks about the Laplace Operator 266 

14.2.2 Volumes and Surface Areas of Balls in M" 267 

14.2.3 Mean-Value Property and the Maximum Principle 268 

14.3 Solving Laplace's Equation 272 

14.3.1 Uniqueness of Solutions 272 

14.3.2 Fundamental Solution of Laplace's Equation 274 

14.3.3 Green's Functions 277 

14.3.4 Properties of Green's functions 279 

14.4 Green's Functions: Examples and Exercises 282 

14.4.1 Green's function for the half-space 282 

14.4.2 Green's function for the ball 285 

15 The Heat Equation 289 

15.1 Separation of Variables 289 

15.2 The Fundamental Solution 294 

15.3 Solving the Heat Equation 298 

15.3.1 The Cauchy Problem 298 

15.3.2 Diffusion on the Half-Line: The Method of Images 300 

15.4 Applications to Finance 302 

15.4.1 The Black-Scholes Option Formula 302 

15.4.2 Barrier Options 303 

15.5 Distributions 306 

15.5.1 Basic Definitions 306 

15.5.2 Convergence of Distributions 309 

15.5.3 Differentiation of Distributions 310 

15.6 Green's Functions Revisited 312 

15.6.1 Laplace's Equation 312 

15.6.2 The Heat Equation 313 

16 The Radon-Nikodym Theorem* 315 

16.1 Definitions and Statement of Radon-Nikodym Theorem 315 

16.2 Proof of the Radon-Nikodym Theorem; Related Results 316 

16.3 Products 321 



Contents v 



16.3.1 Introduction 321 

16.3.2 Products of Measure Spaces 322 

A Convergence in M 327 

A.l Definition of Convergence 327 

A. 2 The Completeness Axiom 330 

A. 3 limsup and liminf; Subsequences 333 

A. 4 Cauchy Sequences and Completeness 340 

B Sets and Logic 345 

B. l Logic, Formal Languages, Quantifiers 345 

B.2 Basic Set Theory 347 

B.2.1 Sets 347 

B.2. 2 Union and intersection 348 

B.2. 3 Set difference, complementation and symmetric difference 349 

B.2.4 Set algebra 349 

B.2.5 Products 351 

B.3 The Extended Real Number System 352 



Chapter 1 



Metric Spaces, Normed Spaces and 
Inner Product Spaces 



1.1 The Geometry of 

I will assume that you are thoroughly familiar with the following facts and notions: 

• is the set of all ordered n-tuples with components in M: 

W = {{ri,...,rn):rieR,l<i<n} 

is identified with R, and called the rm/ line; M? is called the real plane. 
R" is commonly referred to as n-dimensional Euclidean space, and also Cartesian space. 
The elements of M" may also be referred to as real n-dimensional vectors. 

• can be endowed with operations of addition and scalar multiplication: 

{xi,...,Xn) + (2/1,. . • ,2/n) = {Xi + yi,.. .,Xn + yn) 

a{xi, . . . , Xn) = {axi, . . . , axn) a G R 
This makes M" into an 77,-dimensional vector space (over the scalar field M) . The vector 

= (0,...,0) 

is an identity element for the operation of addition. We denote it simply by 0. 

Thus we use the same symbol for the number 0, the vector (0, 0), the vector (0, 0, 0), etc. Which zero 

is meant will be obvious from context. 

• can be equipped with an inner product, a map 

(•,•) :M"xM"^M 
defined as follows: If x = (xi, . . . , x„) and y = (yi, . . . , yn), then 

{x,y) = xiyi H h 

The inner product on R^ is just ordinary multiplication. On M", the inner product is 
also called the dot product and often denoted by {x, y) = x ■ y. It has the following 
properties, which you are invited to verify yourself: 



1 



2 



Convergence and Continuity in Metric Spaces 



(i) {x, x) >0 for all x eR; 

(ii) {x, x) = if and only if a: = 0; 

(iii) {x,y) = {y,x) for all x,y & M"; 

(iv) {x, y + z) = {x, y) + [x, z) for all x,y, z £ M"; 

(v) {ax, y) = a{x, y) for all x,y & M" and a € M. 

A vector space V equipped with a map (•, •) which satisfies (i)-(v) is called an inner product space. 

• The space MP can be equipped with a norm or length 

II • II : R" R+ 

defined by 

Ikll = \^{x,x) = \lxl^ \-xl_ 

The norm in M} is just the usual absolute value. The norm satisfies the following 
conditions: 

(i) Ikll > for all x G W^; 

(ii) ||x|| = if and only if a; = 0; 

(iii) llaxll = |a|||a;|| for all x € M" and a G M; 

(iv) ||x + y|| < ||a;|| + ||y|| for all x, y G M" (Triangle Inequality); 

A vector space V equipped with a map 1 1 • | j which satisfies (i)-(iv) is called a normed vector space. An 
inner product will always induce a norm by putting ||a;|| = (x,x}. However, a norm need not be induced 
by an inner product. 

• can be equipped with a metric, or distance 

d{-,-) :M" xM"^M+ 

defined by 

d{x, y) = \\x-y\\ = \/(xr^^yipl^^TH^7a;r!^^-yr^ 

d{x,y) is simply the distance between x and y. The metric d satisfies the following 
conditions: 

(i) d{x, y)>0 for all x, ye M"; 

(ii) d{x, y) = if and only ii x = y; 

(iii) d{x, y) = d{y, x) for all x,y e M"; 

(iv) d{x, z) < d{x, y) + d{y, z) for all x,y,z e (Triangle Inequality); 

The Triangle Inequality for the metric follows directly from the Triangle Inequality for the norm: 

d{x,z) = \\x-z\\ = \\{x - y) + {y - z)\\ < \\x - y\\ + \\y - z\\ = d{x,y) + d{y, z) 

Note that properties (i)-(iv) for d, unlike the properties for the inner product and norm, do not men- 
tion addition or scalar multiplication at all. A set X (not necessarily a vector space) equipped with 
a map d satisfying (i)-(iv) is called a metric space. A norm will always induce a metric by putting 
d{x,y) = \\x-y\\. 



Metric Spaces, Normed Spaces and Inner Product Spaces 



3 



1.2 Convergence and Continuity in Metric Spaces 

Many of the structures on the space M" that we discussed in the previous section have ana- 
logues in different kinds of spaces. The most basic structure that we shall need in this course 
is the notion of a metric space: 

Definition 1.2.1 A metric space is a pair {X,d) consisting of a set X together with a 
map 0? : X X X — s- M, called a metric, which satisfies the following conditions: 

(i) d{x, y) >0 for all x,y e X; 

(ii) d{x, y) = if and only if x = y; 

(iii) d(x, y) = d(y, x) for all x, y £ X; 

(iv) d{x, z) < d{x, y) + d{y, z) for all x,y, z & X (Triangle Inequality); 

If {X, d) is a metric space, and x, y are points in X, then d{x, y) should be interpreted 
as the distance between x and y. With this interpretation in mind, parts (i)-(iv) of Defn. 
1.2.1 can be easily understood. Though we have distilled the definition of metric from our 
experience with distances in M", there are many metric spaces which do not even remotely 
resemble M". For example: 

Example 1.2.2 (1) Let X be any^ non-empty set, and define 

f if X = V 
d:XxX ^'R:{x,y)^ { ' 
y\ itx^y 

Then {X, d) is a metric space. 

Here d is called the discrete metric, and {X, d) is called a discrete space. 
(2) Let (X, d) be a metric space, and let y C X. Let dy be the restriction of d to Y , i.e. dy is defined only 
on y X y, and coincides there with d. Then (y, dy) is a metric space. Now even though (X, d) might be 
"nice", (y, dy) could be quite "wild". For example, we could take (X, d) to be the set of reals with the 
usual metric d(a;o,a;i) := |xo — x\\, and let Y be the set of irrational numbers. 

□ 

Thus many familiar operations on M" may not be possible on an abstract metric space. 
Addition and multiplication may be undefined, there may be no order relation, etc. But 
one operation is possible: It is possible to define limits. For once we have a distance, we 
can talk about convergence: Xn converges to x iff the distance d{xn-, x) between Xn and x 
converges to 0. Thus convergence of a sequence of abstract points {xn)n is defined in terms 
of convergence of a sequence of real numbers {d{xn,x))n — and we already know what we 
mean by convergence of real numbers! 



Definition 1.2.3 Let {X,d) be a metric space, and let Xn,x 


G X (for n e N). We say 


that 




Xn^X iff d{Xn,x)^0 




Thus 




Xn^x iff Ve > 03iVVn > N[d{xn, 


x) < e] 


We write lim„ Xn = x when Xn ^ x. 





^X might be a set of cows, for example. 



4 



Convergence and Continuity in Metric Spaces 



Exercise 1.2.4 (a) Show that a sequence in a metric space can have at most one hmit. 

(b) Show that if a sequence {xn}n converges in a metric space, then every subsequence of {xn}n converges as 
well, and to the same limit. 



Once we have distance, we can also define the notion of continuity. Intuitively, a function 
/ is continuous at a point xq iff whenever x is "very close" to xq, then f{x) is "very close" to 
/(xo). Slightly more precisely. 

We can make f{x) as "close" to f{xo) as we wish, by taking x sufficiently "close" to xq. 

There are no "jumps" then, because the closer x gets to to xo, the closer f{x) gets to f{xo). 
This is not very precise, so we need to proceed with care: 

• Continuity makes sense for a function f : X ^ Y between to metric spaces {X, dx) and 
(y, dy). We can use dx to talk about "closeness" of x,xq in X, and dy to talk about 
"closeness" of f{x),f{xo) in Y. 

• Since "closeness" is subjective, we will demand that it holds for absolutely anybody's 
idea of "close" . Specifically, suppose you define "close" by specifying some real num- 
ber £ > and saying "/(x),/(xo) are "close" in Y whenever ^/(x), /(xq)^ < e" . 
Intuitively, then, / is continuous at xq iff whenever x is "sufficiently close" to xq, then 



• This "sufficiently close" can now be described by some real number 5 > 0: / is contin- 
uous at Xo if there is (5 > such that 



• Since this must hold for anyone's idea of "closeness", i.e. for any e > 0, we now make 
the following definition: 

Definition 1.2.5 Suppose that / : {X, dx) —>■ {Y, dy) is a function between metric spaces, 
and let xq G X. We say that / is continuous at xq iff 



We say that / : {X, dx) {Y, dy) is continuous iff it is continuous at every xq & X. 

Note that the notion of continuity depends on both the metric of the domain space and 
the metric of the range space. 

Here is a characterization of continuity in terms of convergence of sequences. 

Proposition 1.2.6 A map f : {X, dx) {Y, dy) is continuous at xq & X iff 



□ 



dy (/(x),/(xo)) <e 




whenever 



dx{x,xo) < 6 




Whenever x„ — ^ xq in X, then f{xn) — ^ f{xo) 



in Y 



Exercise 1.2.7 We prove Propn. 1.2.6. Let / : {X,dx) — > (F, dy), and let xo € X. 



Metric Spaces, Normed Spaces and Inner Product Spaces 



5 



(a) Suppose that / is continuous at xo, and assume Xn —> xo in {X,dx)- We must show that f{xn) fixo) 
in (F, dy). So let e > 0. Explain why there is (5 > so that dy ^/(a;), /(xo)^ < s whenever dx{x,xo) < S. 
Also explain why there is iV G N so that dx{xn, xo) < 6 whenever n> N. Conclude that 

dy (j{x„), f{xo)J <e whenever n>N 

and explain why this implies that f{xn) — » f{xo) in (y, dy). 

(b) To prove the converse, suppose that / is not continuous at xo. By examining the definition of continuity, 
explain why there is a e > with the following property: 

V5 > 03a; e x\dx{x,xo) < 5 /\ dy [f {x) , } {xq)^ > e] 

Now take S := ^ to find Xn € X with the property that dx{x„,xo) < ^, yet dy ^/(a;„), /(xo)^ > s. 
Conclude that Xn — > xo in X, yet f{xn) -h fixo) in Y. Explain why this proves the converse. 

□ 

Because an abstract metric space does not come with an order relation, we cannot define 

sup and inf in an arbitrary metric space. Hence the Completeness Axiom makes no sense in 
a metric space. We can, however, use the equivalence suggested by Exercise A.4.7: 

Definition 1.2.8 Let {X, d) be a metric space. 

(a) A sequence {xn)n in ^ is called a Cauchy sequence if and only if 

sup d{xm,Xn)^0 as iV — ^ oo 

m,n>N 

i.e. iff for every e > there is an AT € N such that 

d{xn,Xm) < £ whenever n,m> N 

(b) We say that {X, d) is complete iff every Cauchy sequence in X converges (to a limit 
which is in X). 

Exercise 1.2.9 (a) Show that a convergent sequence is always a Cauchy sequence, in any metric space. 

(b) Show that a discrete metric space (cf. Example 1.2.2(1)) is necessarily complete. 

(c) Show that there exist metric spaces which are not complete, i.e. in which not all Cauchy sequences 
converge. 

□ 

It is possible to define convergence and continuity in spaces which are even more abstract 
than metric spaces, using even vaguer notions of "closeness" which do not depend on having a 
distance function (i.e. a metric). In these so-called topological spaces, the fundamental notion 
is that of an open set. You will shortly get see how this works. In many cases, however, the 
spaces in which we work have more structure, rather than less, and it is to these that we now 
turn. 



6 



Normed Spaces 



1.3 Normed Spaces 

Throughout this section we consider vector spaces V over the scalar field M.^ 

Definition 1.3.1 A norm,eA space is a pair {V^ II " ID' where F be a vector space and || • || 
is a norm on V ^ i.e. a function || • || : F ^ M with the following properties: 

(i) > for all X G 1/; 

(ii) ||x|| = if and only if x = 0; 

(iii) ||q;x|| = |a| ||a;|| for all x G F and a G M; 

(iv) ||a^ + yll < ||a;|| + ||y|| for all x,y (Triangle Inequality); 

The norm ||f|| of a vector v should be interpreted as its length, as in the following 
standard examples. 

Examples 1.3.2 (a) F := R with \\v\\ := \v\ (the absolute value). 

(b) V := K" with the Euclidean norm 

||v||:=^^Wfe^ where V = (wi, ... ,w„) 

□ 

Here are some other norms on M": 

Exercise 1.3.3 For x e R", let x = (a;i, . . . 

(a) Define || ■ ||i on M" by 

||x||i = |a;i| + --- + |a;„| 

Show that 1 1 ■ I |i is a norm on R"*^. 

(b) Define || • ||oo on K" by 

||x||oo = niax{|a;i|, . . . , |a;„|} 

Show that II • ||oo is a norm on R"; 

□ 

We will give more interesting examples shortly. 

Exercise 1.3.4 Let (V. 1| • ||) be a normed space. 

(a) Prove that | |a;| | = || - a;|| for all x€V. 

(b) Prove that 

||a;-2/|| > |||x||-||j/||| forallx,j/eV 

(c) Prove that the norm-mapping is continuous, i.e that the map V — » R : a; i— > ||a;|| is continuous. 

□ 

Conditions (i)-(iv) in the definition of a norm seem similar to the conditions (i)-(iv) in 
the definition of a metric. This is no accident: 

^Later in this course, we may have occasion to consider vector spaces over C as well. 



Metric Spaces, Normed Spaces and Inner Product Spaces 



7 



Proposition 1.3.5 Every norm induces a metric: If {V,\ \ • ||) is a normed space, then 
{V, d) is a metric space, where d :V xV ^'R is defined by 

d{v, w) := \ \v — w\\ 



Exercise 1.3.6 (a) Prove Propn. 1.3.5. 

(b) Prove that the converse of Propn. 1.3.5 is false: Not every metric on a vector space V is obtained from a 
norm. 

[Hint: Consider the discrete metric on a vector space V.] 

□ 

Here is a first look at function spaces: 

Exercise 1.3.7 Suppose that [a, b] is a closed interval in R. Let C[a, b] be the set of all continuous functions 
/ : [a, b] ^ R. 

(a) Show that C[a, b] is a vector space, when the operations of addition and scalar multiplication are defined 

pointwisc. 

(b) Define || • l|i ■.C[a,b] ^ R by 

ll/lli := flfmdt 

J a 

Show that 1 1 ■ I |i is a norm on C[a, b]. 

(c) Define || ■ ||oo : C[a,b] ^ R by 

ll/lloc :=sup{|/(t)|:i€[a,6]} 

Show that II • ||oo is a norm on C[o, 6]. 

□ 

Exercise 1.3.8 Let be the set of all sequences {xn)n in R with the property that |xn| < oo. Show 
that is a vector space, and that ||(a;„)„||i := \xn\ defines a norm on l^. 

□ 



1.4 Inner Product Spaces 

Next, we consider an additional structure on a real vector space V: 



Definition 1.4.1 An inner product space is a pair {V, (•, •)), where F be a vector space 
and (•, •) is an inner product on V, i.e. a function (•, •) : V x V ^ M. with the following 
properties: 

(i) {x, x) >0 for all X eV; 

(ii) (x, x) = if and only if x = 0; 

(iii) (x, y) = {y, x) for all x,y & V; 

(iv) (x, y + z) = (x, y) + (x, z) for all x,y,z e V; 

(v) {ax, y) = a{x, y) for all x, y G F and a G M. 



8 



Inner Product Spaces 



Example 1.4.2 K" is an inner product space when equipped with the usual dot product: 

n 

(x,y) :=x-y = ^Xjj/j 

where x = {xi, . . . ,a;„),y = (j/i, . . .,y„). 

□ 

We shall shortly present more interesting examples of inner product spaces. For the moment, 

we note that every inner product induces a norm, in the same way that the dot product in 
M" yields a length: The length of a vector x G M" is given by ^x • x, and it turns out that 
:= ■\J~{x^x) defines a norm in terms of the inner product. 
To prove that, we need the following result: 

Proposition 1.4.3 (Cauchy-Schwarz Inequality)'* 
// (y, (•, •)) is an inner product space, then 

\{x,y)\ < \/J^^ V (y> y) 

for all x,y E V. 

Moreover we have equality iff x is a scalar multiple of y. 

"Also called tlio C'aucliy Bimyakovskii Schwarz lutxiuality 

Exercise 1.4.4 We prove Propn. 1.4.3. Let (V, (•,•)) be an inner product space. 

(a) First show that for a £ R and x,y £ V we have 

< (x — ay, X — ay) = (x, x) — 2a{x, y) + a^ {y, y) 

(b) Now, wit X, y held fixed, consider the righthand side of the above inequality as a quadratic polynomial in 
a. By examining its discriminant, explain why 

{x,yf - {x,x) {y,y) < 

with equality only when x = ay for some a. 

(c) Now conclude the result. 

□ 

Proposition 1.4.5 // (V, (•,•)) is an inner product space, then the map || • || : 1/ ^ M 
given by 

\\x\\ := \J {x, x) 

defines a norm on V . 
Exercise 1.4.6 Prove Propn. 1.4.5. 

□ 

Exercise 1.4.7 (a) Suppose that {V, {■,■)) is an inner product space, and that || • || is the norm induced 
by the inner product. Prove that if x,y £ V, then 

||a; + j/||^ + ||:c-y||^ = 2(||x||^ + ||y||^) 

Why do you think this identity is called the Parallelogram Law? 



Metric Spaces, Normed Spaces and Inner Product Spaces 



9 



(b) Prove or Disprove: If (V, jj ■ j|) is a normed space, then it is possible to define an inner product 
(•, •) : V X 1/ — > R such that || • || is the norm induced by this inner product. 

(c) Prove or Disprove: If (F, || • ||) is a normed space which satisfies the Parallellogram Law, then it is 
possible to define an inner product (•,•): 1/ x F —> R such that || ■ || is the norm induced by this inner 
product. 

□ 

In M", the dot product does not only induce a length; it also induces an angle: The angle 
9 between two vectors x, y G is given by 

cos = V. — n-r. — rr 



m\ \\y\ 



We can imitate this definition in an abstract inner product space {V, (•,•)), and define the 
angle between x,y E V hy 



cos 6 := 11 .' I 11 where := -v/ (x, x) 



By the Cauchy-Schwarz inequality it follows immediately that | 008 6*1 < 1, so that this defi- 
nition makes sense. It also follows that | cos9\ = 1 if and only if x is a scalar multiple of y, 
i.e. iff X, y are parallel. We can also define orthogonality in an abstract inner product space, 
in the obvious way: 



Definition 1.4.8 Suppose that (V, (•, •)) is an inner product space. We say that x,y eV 

are orthogonal, and write x _L y, if and only if {x, y) = 0. 
IfGCV, we say that x ± G iS\/g e G{x ± g). 



The following exercise gives a nice example of an inner product space that we shall meet 
again later. It shows that several commonly used statistical terms are actually derived from 
inner product spaces. 

Exercise 1.4.9 Consider a random experiment — because we do not yet have the measure theoretic 
machinery required, we have to be a little imprecise here — and let V be the set of all random variables with 
zero mean and finite variance. 

(a) Show that V is a vector space. 

[Hint: Imitate the proof of the Cauchy-Schwarz inequality to show that if X,Y € V, then X + Y has 
finite variance.] 

(b) Define {■,■} - V xV ^Rhy {X,Y) = K[XY] (where EX denotes the expectation of X). Show that (, •, •) 
is an inner product on X. 

(c) This inner product induces a norm || • {|. What is the usual name for in statistics? 

(d) The inner product also induces an angle 9 between two random variables X,Y € V. What is the usual 
name for cos d in statistics? 

□ 

Exercise 1.4.10 Let C[a, 6] be the set of all continuous functions / : [a,b] —> R. We already know from 
Exercise 1.3.7 that C[a, 6] is a vector space, and we defined two norms || • ||i and || • ||oo on this space. 

(a) Show that the Parallellogram Law fails for both || • ||i and || • ||oo. Thus these norms are not induced by 
an inner product. 



10 



Inner Product Spaces 



(b) Define a map (•, •) : C[a, b] x C[a, 6] R by 

(/,5> := J' f{t)g{t)dt 

Show that (•, •) defines an inner product on [a,b]. 

(c) The induced norm is therefore 

Wfh := ( r f{tf at 



Compare the norms || ■ II ' II2 and || • ||oo with the norms on R" discussed in Example 1.3.2 and Exercise 
1.3.3. 

□ 

Once we have introduced some measure theory, it will become apparent that Exercises 1.4.9 
and 1.4.10 are really instances of the same idea. 

In W^, the standard basis {ej : i = 1, . . . , n} is orthonormal i.e. 



if i 7^ j (orthogonal) 

1 if i = j (normal) 



Definition 1.4.11 Let {V, {, ■, •)) be an inner product space and let U CV. We say that 
U is orthonormal iff for all u, u' G U, we have 



{u, u') 



if u / ti' 

1 \f u = v! 



Exercise 1.4.12 If U is an orthonormal set, then the vectors in U are mutually linearly independent. 
[Hint: Suppose that '^f.CkUk = and consider (wj,0).] 



□ 



Now assume that (V, (•,•)) is an inner product space which posessesses an orthonormal 
basis basis {ui, . . . ,Un} so that V is finite dimensional. It is then very easy to rperesent 
every vector in F as a linear combination of these basis vectors: If v = Yl^=i (^kUk, then 
{v,Uj) = YJj=i Ck{uk,Uj) = Cj, and so 



n 



The next theorem shows that this can always be done in a finite dimensional inner product 
space: 



Theorem 1.4.13 (Gram-Schmidt Orthogonalization) If {V, (•,•)) is a finite dimensional 
inner product space, then V has an orthonormal basis. 



Proof: Suppose that {vi, . . . ,Vn} is a basis of V. We proceed by inductively building an 
orthonormal basis {ui, . . . , u„} so that 

spanjui, . . . , Vi} = span{«i, . . . , Ui} for i = 1, . . . , n 



Metric Spaces, Normed Spaces and Inner Product Spaces 



11 



Define ui := Then \\ui\\ = 1, cn spanjui} = spanjwi}. 

Assume now that we have aheady defined ui, . . . ,Ui (for 1 < f < n), so that {ui, . . . , Ui} 
is an orthonormal set with the same span as {vi, . . . ,Vi}. We must now define ttj+i. First 
define 

i 

and note that 

(i) Wi^i 7^ 0, for otherwise would be a hnear combination of ui , . . . , Uj and Vi , and thus 
a hnear combination oi vi, . . . , fj+i- But vi, . . . , Vj+i is hnearly independent. 

(u) If 1 < j < i, then 

n 

{Wi+l,Uj) = {Vi+l,Uj) - ^{Vi+l,Uk){Uk,Uj) = 

k=l 

It follows that ui, . . . , Uj, u^i+i is an orthogonal set, and thus linearly independent, 
(iii) As spanjui, . . . , Ui} = spanjwi, . . . , f j}, we see that spanjui, . . . , Uj, Wi-\-i} = spanjwi, . . . , 
The only potential problem is that we might not have = 1. Therefore, define 

•= n n 



H 

The next exercise shows that in a finite-dimensional inner product space, inner products 
are essentially just dot products: 

Exercise 1.4.14 Suppose that {ui, . . . ,u„} is an orthonormal basis for V. If v = X^^^i VkVik and w = 
YjI=i "WkUk, then 

(a) (v,w) = Yll=iVkWk 

(b) {v,w)=ELi(v,u,)(ufc,w) 

(c) l|v|| = (ELi(v,Ufc>')'- 

□ 

At a later stage, we will encounter these ideas — orthonormal representations of vectors 
in inner product spaces — again: in infinite-dimensional function spaces, when we discuss 
Fourier analysis. 



12 



Inner Product Spaces 



1.5 Linear Operators 

Suppose that V, W are vector spaces. A linear operator T : V ^ W is one which preserves 
the operations of addition and scalar multiplication, i.e. one which satisfies 

T{x + y) = Tx + Ty T{ax) = aTx 

We now tackle the question of when a linear operator is continuous. To do that, we require 
additional structure on V and W, because continuity involves the idea of "closeness". We 
will henceforth consider linear operators between normed spaces. 

By translating the definition of continuity from the language of metrics to the language 
of norms, we see that a map T : (V, || • — >■ {W, \ \ ■ \ \w) is continuous if and only if 

Vx e We > 03(5 >Oyy eV[\\x-y\\v <S ^ \\Tx - Ty\\w < e] 

When T is linear, we have Tx — Ty = T{x — y). Setting h := x — y, we thus obtain 

\\h\\v < 5 ^ \\Th\\w < e 

This suggests the following definition: 

Definition 1.5.1 A linear operator T : (V, || • ||y) — {W, \ \ ■ \ \w) between normed vector 
spaces is said to be bounded if and only if there is a constant C G M such that \\Tx\\w < 
C||x||i/ for all X & V. 

To simplify notation, wc will use the same symbol || ■ |1 for || ■ \ \v,\ \ ■ \\iv, etc. This should not cause any 
confusion; you need merely look where the vectors reside. Thus if T : V —^ W and x G V, then ||a;|| = ||a;||v 
and \\Tx\\ = \\Tx\\w 

The next lemma shows that there are plenty of bounded linear operators: 

Lemma 1.5.2 Every linear operator defined on a Euclidean space is bounded, i.e if T : 
(MP, II • II) — ^ (V, II ■ II) «s a linear transformation, then there exists a constant C such that 

||rx||<C||x|| forallxGW 

(where the norm on is the standard Euclidean norm.) 

Proof: Let ei, . . . ,e„ denote the standard basis of M", and put C = nmax{||rei||, . . . , ||ren||}, 
so that each ||Tej|| < ^. If ^ = {hi, . . . ,hnY'^ e K", then h = and each 

\hi\ = < y^h'i + --- + hl = \\h\\. It follows, using the triangle inequality and the 

inequalities just obtained that 

\\Th\\ = \\T{J2 h^e,)\\ < %\ ||re,|| < \\h\\'^ = C \\h\\ 

i=l i=l i=l 

H 

The next proposition shows that a linear operator is continuous if and only if it is bounded: 

Proposition 1.5.3 Let T : V ^ W be a linear operator between normed vector spaces 
V, W. Then T is continuous if and only if it bounded. 



Metric Spaces, Normed Spaces and Inner Product Spaces 



13 



Proof: Suppose T is continuous. Choose (5 > so that ||rx|| < 1 whenever ||x|| < 5. Let 
C>].lixeV, then ll^ll < 6, so ||T(^)|| < 1, i.e. ||Tx|| < C\\x\\. 

Conversely, suppose that T is a bounded operator, and that < C||a;|| for all x ^V. 

To show that T is continuous, it suffices to show that Txn — Tx whenever x„ — x, i.e. that 
\\Txn — Tx\ \ — whenever — x|| — 0. But this is easy: 

\\Txn — Tx\\ = — x)|| < C||xn — ^ as \\xn — x|| — > 

H 

Proposition 7.1.1 and Lemma 7.1.2 immediately imply that: 
CorollEiry 1.5.4 Any linear operator L : — is continuous. 

Remarks 1.5.5 Since every finite dimensional real vector space is isomorphic to a Euclidean space, the 
preceding corollary proves that all linear operators defined on finite dimensional normed vector spaces are 
continuous. This breaks down for infinite dimensional vector spaces, as you will see in the next exercise. 

□ 

Exercise 1.5.6 Consider the subspace C^[a, 6] C C[o, 6] which consists of all continuously differentiable^ 
functions / : [a,h\ R. Equip C^[a,6] with the || • ||oo-norm, i.e. ||/||oo := sup(g[„ ;,] For to € (o,6), 

define the map Dtg : C^[a, 6] — > K by 

Dtof = f'ito) 

(a) Show that Dt^ is linear. 

(b) Show that Dt„ is not generally continuous. 

[Hint: Define /„(a;) := ^ sm2iTnx. Then /„ 0. Take to = |.] 

□ 

1.6 Projections in Hilbert Spaces 

We end this chapter with an important concept: that of orthogonal projection. 

If y is a linear subspace of M", then we can project any x G M"^ onto V. That is, we can 
represent x as a sum 

X = x'l + x"*" where x" G V, x"*" _L V 

One can think of xH as the best approximation to x in V: It is the vector in V which lies 
closest to X. 

This can also be done in an inner product space, provided that the subspace is com- 
plete. Recall that a metric space is said to be complete if and only if every Cauchy se- 
quence converges. Since every inner product induces a norm, and since every norm induces 
a metric, it makes sense to talk about complete inner product spaces and complete normed 
spaces. Specifically: A sequence {vn) in a normed space (Vi|| • ||) is a Cauchy sequence iff 
suPn,m>w ll'^n ~ ^ as — > oo. A normed space {V, || • ||) is complete iff every Cauchy 
sequence in V converges (to a vector in V) . 

Complete spaces are important enough to warrant names: 

• A complete normed space is called a Banach space. 
^i.e the derivative exists and is continuous. 



14 



Inner Product Spaces 



• A complete inner product space is called a Hilbert space. 

A subspace W of a Banach (Hilbert) space (V, (•, •)) need not be itself a Banach (Hilbert) 
space. If it is, the subspace is said to be closed. 

Example 1.6.1 We will shortly see that the space (C[— 1, 1], || • ||oo) is a Banach space. The absolute value 
function f{x) := \x\ can be approximated uniformly by continuously differentiable functions, i.e. there exists 
a sequence /„ of continuously differentiable functions such that — /||oo 0. Thus the space C'^f— 1, 1] of 
continuously differentiable functions is a subspace of C[— 1, 1] which is not closed: (fn)n is a Cauchy sequence 
in C^[— 1, 1] which does not converge to a point (function) in C^[— 1, 1]. 

□ 

Suppose that F is a Hilbert space, and that W is a closed linear subspace of V. If vq G V, 
we can find the best approximation of vq in W. This is the unique vector wq with the properties 
that 

(i) Wo eW, and 

(ii) 11^0 — wqW = inf{||t;o — w\ \ -.we W}, i.e. wq is the vector in W that lies closest to vq. 

(iii) Moreover, {vq — wq) -L W. 

The vector wq satisfying (i)-(iii) is called the orthogonal projection of vq onto W. Indeed, 
Vq = Wq + {vq — Wq) decomposes Vq into a vector in W and a vector orthogonal to W. It 
remains to show that orthogonal projections exist and are unique. 



Proposition 1.6.2 Let V be a Hilbert space, and let W be a closed linear subspace ofV. 
Then any vq in V has a unique decomposition 

^0 = '"o + '"o" where Vq G W, Vq ±W 

Vq is called the orthogonal projection of vq onto W. 

Proof: Uniqueness: If 

where foj^jj G W and Vq,Uq ± W, then 

4' - U^o = '^0 - ^0 =■ X 

is a vector with the properties that x e W and that x -L W. This implies that x -L x, i.e. 

that {x, x) = 0. Hence x = 0, and so = Uq,Vq = Uq . 

Existence: Let 6 = inf{||fo — w\\ : w £ W}, and choose a sequence Wn G W such that 
11^0 — 'ii'nil — ^ f^- We show that {wn)n is a Cauchy sequence in W: for if e > 0, we may choose 
N such that \\vq — WnW^ — 6"^ < e whenever n> N. By the Parallelogram Law it follows that 
if n, m > then 

2e+25^ > \\vo-Wn\\'^ + \\vo-Wm\\'^ = 2\\vo- l{Wn+Wm)\\'^+2\\^{Wn-Wm)\\'^ > 2S'^+^\\Wn-Wm\\ 

Since {wn)n is a Cauchy sequence, and since W is closed, there is wq G W such that Wn wq. 
We will show that wq = Vq. The fact that ||fo — 'Wo|| < lli'o — i«n|| + \ \wn — wo\\ (for all n G N) 
then is easily seen to imply that Hi'o — u)o|| = (5. 



Metric Spaces, Normed Spaces and Inner Product Spaces 



15 



It remains to show that vq — wq _L W . Given an arbitrary w ^ W and A G M, have 
11^0 — wq\\'^ = 6^ < \\vq — {wq + Aw)|p, so that 

-2A(uo - wq,w) + A^lliull^ > 

Since this holds for all A we must have {vq — wq,w) =0. (Another way to see this is to note 
that the quadratic in A has a unique root at A = 0) and to calculate the discriminant.) 

H 

Remarks 1.6.3 An examination of the proof above shows that we require only that W is complete, i.e. if 
W \s a, complete subspace of an inner product space V , then orthogonal projections onto W exist. 

□ 

Exercise 1.6.4 Suppose that is a closed subspace of a Hilbert space (V, {•, •)). For each v & V , let w" 
be the orthogonal projection of v onto W. Define a map P -.V hy Pv := u". Use the uniqueness of the 
orthogonal projection to prove the following results. 

(a) Show that P is a bounded linear operator. 

(b) Show that P is idempotent, i.e. that P^ ^ P (i.e. P{Pv) = Pv for all v £V). 

(c) Show that P is self-adjomt, i.e. that (Pvi, V2) = {Pvi,Pv2) = {vi,Pv2) for all Vi,V2 € V. 

(d) Show that kerP = (ran P)-^ = W-^. 

□ 

Rertlcirks 1.6.5 A bounded linear operator P : V ^ V satisfying (b), (d) in Exercise 1.6.4 is called a 
projection. The exercise shows that every orthogonal projection is a projection. It can be shown that every 
projection P is an orthogonal projection, onto the space ran P. 

□ 

We will encounter these ideas again later, when we discuss conditional expectation of random 
variables. 



Conditioning = Projecting! 



16 Inner Product Spaces 



Chapter 2 

Basic Notions of Topology 



2.1 Countable and Uncountable Sets 

In this section, we investigate the idea of the cardinality (or size) of a set, with particular 
emphasis on countable sets. We will need these ideas to define separable topological spaces, 
as well as to define the notion of measure space, later in this course. 

For finite sets, we can determine the size of a set by counting its elements. Thus for 
example, the set {a, 6, c} has cardinality 3 (it has 3 elements). We are going to extend this 
idea of counting to obtain the size of infinite sets, and we will see that infinity comes in many 
sizes. 

First, we explore the idea of counting: For the moment, let n = {1, 2, . . . , n} be the set of 
the first n natural numbers. To say that A = {a, b, c} has 3 elements is equivalent to saying 
that there is a one-to-one correspondence between the sets A and 3. Indeed, this is the heart 

of the idea of counting: When we count the elements of A, we are setting up a bijection 
between A and 3. When we count "One, two, three", pointing our finger at a,b,c, we are 
defining a map 

f : A^3: at-^l,bt-^2,ct-^3 

Thus the idea of counting the elements of a finite set X involves finding a bijection between 

X and some n. If there is a bijection from X to n, then X has n elements. 

Mathematicians often start counting at zero, i.e. in the mathematical literature, the sets 
n are usually defined as 

n = {0,l,2,...,n-l} 

(We then do not need n to define n.) This is the convention that we shall adopt henceforth. 

It is obvious that two finite sets A and A have the same size if and only if there is a 
one-to-one correspondence f : A = A. We don't even have to count A and A to know that 
they have the same number of elements. If ^4 = {a,b,c,d} and A = {a,/?, 7, 5}, then the 
existence of the bijection f : A = A given by 

/(a)=/3,/(6) = 5,/(c) = a,/(d) = 7 

is sufficient to show that A and A have the same number of elements. It doesn't tell us that 
this number is 4. Thus two sets have the same size if and only if there is a bijection between 
them; we can bypass the idea of number. This is important, because we cannot actually count 
infinite sets. But we can establish bijective correspondences between infinite sets. We shall 
adopt this idea as our basic idea of size. 



17 



18 



Countable and Uncountable Sets 



Definition 2.1.1 We define an equivalence relation ~ between sets as follows: If A, B are 
sets, we say that A 5 if and only if there is a bijection from AtoB. li Ak. B, we say 
that A and B have the same csirdinality. We may also indicate this by saying \A\ = \B\. 

Note that having the same cardinality is an equivalence relation between sets, i.e. that 

(i) 1^1 = 1^1 (Reflexivity) 

(ii) If \A\ = \B\, then \B\ = \A\ (Symmetry) 

(iii) If \A\ = \B\ and \B\ = \C\, then \ A\ = \C\ (Transitivity) 

Exercise 2.1.2 Prove this assertion. (Note that the assertion is not obvious: When we say that \A\ = \B\, 

we are not actually claiming that there arc two equal numbers. What we are saying is that there is a bijection 
from A to B. To prove (i), for example, you have to find a bijection from A to A.) 

□ 

Examples 2.1.3 (a) Two finite sets have the same cardinality if and only if they have the same number 
of elements. 

(b) For finite sets, if ^ is a proper subset of B, then \ A\ < \B\. This breaks down completely for infinite sets. 

Consider, for example, the sets N and Z. It is certainly true that N C Z. However, the map N — > Z 
defined by 

n . 
— it n is even 

2 , 

— — it n is odd 

is a bijection: /(I) = 0,/(2) = l,/(3) = -l,/(4) = 2, /(5) = -2, /(6) = 3.... (Note that we are 
zig-zagging from the positive integers to the negative integers.) Thus N and Z have the same cardinality, 
even though N contains fewer elements than Z. 

(c) We also have |Q| = |N|. This can be seen as follows. Put the set of strictly positive rational numbers Q"*" 
in an array 

1/1 2/1 3/1 4/1 5/1 ... 

1/2 2/2 3/2 4/2 5/2 ... 

1/3 2/3 3/3 4/3 5/3 ... 

1/4 2/4 3/4 4/4 5/4 ... 

1/5 2/5 3/5 4/5 5/5 ... 

We can then trace a zig-zag path that moves through all the rational numbers as follows. Start at the 
top line and move diagonally down to the left until you reach the leftmost line. Repeat. We thus obtain 
a sequence 

12132143215 

T' T' 2' T' 2' 3' T' 2' 3' 4' I "■ 
All of the strictly positive rational numbers occur in this sequence, and they all occur infinitely many 
times. For example, ^ , | , | . . . lie along the diagonal, and they are all equal. To obtain a bijection from 
N to Q^, we follow the above sequence of rationals, but we omit any number that has already occurred 
to ensure that the function is one-to-one, i.e. we prune away the repeated values. We therefore define the 
function N Q+ by 

/(I) = \, m = ? /(3) = I, m = I, /(5) = I, m = ^, ■ ■ ■ 

Note that /(5) / |, which is after /(4) = f in the sequence, because f = f already occurred as /(I). 
Then / is a bijection from N to Q"^. Now oven though we haven't found a formula for /, it is nevertheless 
a perfectly good function, and all its values can be calculated. Can you see that /(16) = |? 




Basic Notions of Topology 



19 



In the same way, we can set up a bijection g from N to the negative rationals. Just put g{ri) = — /(n). 
Finally, we can define a bijection h : N — > Q using /, g and another zig-zag: We define 

hil) = 0, h{2) = /(I), h{3) = 3(1), /i(4) = /(2), 
M5)=5(2),/i(6) = /(3), /j(7)=fl(3),... 

Again, we have no formula for h, but it is certainly a well-defined function, and all its values can be 
calculated. Check that 7i(23) = — |. 

(d) If A is any set, finite or infinite, then 7-'(A) « 2'*. (Recall that 2^ is the set of all functions from A to 
2 = {0, 1}). This can be seen as follows: If _B C ^, define the indicator function Ib '■ A — > B by 



/s(a) = 



1 if a G B 
else 



Clearly Ib = Ic if and only if _B = C, and so the map 1 : V^A) — > 2^ defined by T{B) = Ib is an 
injection. Now suppose that x £ 2"*, i.e. A {0, 1}. Define a subset B C Ahy 

a€B^ X{a) = 1 

It is clear that I(-B) = Ib = and thus that I is surjective as well. This proves that |'P(A)| = \2-'^\. 

□ 



Definition 2.1.4 A set A is said to be countable if it is either finite or can be put into 
a one-to-one correspondence with the natural numbers, i.e. if \A\ = |n| for some n € 
N, or \A\ = |N|. 



Remsirks 2.1.5 (a) Basically a set A is countable if its elements can be indexed by the 
natural numbers, i.e. if it can be written as ^ = {a„ : n G N}. For if A is countable and 

. . . / 

not finite, then there is a bijection N — > A, and we can take an = f{n). Conversely, if 
A = {an : n G N} is infinite, we can define a bijection from N to ^ by letting f{n) = 
an (although here some pruning is necessary if the an aren't all distinct; see Example 
2.1.3(c)). 

(b) In Examples 2.1.3, we proved that the sets Z and Q arc countable sets. 

(c) The "zig-zag" technique, used above to prove that the rational numbers are countable, 
is often very useful. 



□ 



Exercise 2.1.6 (a) Show that a set A is countable iff there exists a surjection f : N ^ A. 
(b) Show that a set A is countable iflf there exists an injection g : A ^N. 



a 



A very basic question that arises is the following: Are all infinite sets countable? As we 
shall see, the answer is "No!" 

Example 2.1.7 We show that the unit interval I = [0,1] is uncountable, i.e. that we cannot find an 
enumeration 

I = {x„ : n e N} 



20 



Countable and Uncountable Sets 



The proof is by contradiction: Suppose that we can find such an enumeration / = {xi,X2,xa,X4, . ■ ■}, i.c that 
every real number in [0, 1] is equal to Xn for some n. Now every number Xn has a decimal expansion of the 
form 

Xfi — O.X'filXn2Xn^Xn4Xn^ . . • 

where x„m is the m**^ number in the decimal expansion of Xn- Of course some real numbers have two distinct 
decimal expansions, a terminating one and a non-terminating one. For example, 1.0000 • • • = 0.9999 .... We 
will choose the non-terminating decimal expansions for our x„. 

We now create a new real number x from the x„ by a process called diagonalization. We choose a„ G 
{1,2,..., 9} such that the following hold: 

ai Xn, 02 ^ X22, 03 3:33, . ..,an^ Xnn, ■ ■ ■ 

To avoid a situation where we obtain a number x with a terminating decimal expansion, we haven't permitted 
a„ = 0; this is just a technicality. We can now define x: Put 

X = 0.01020304 . ■ ■ 

Here comes the heart of the argument: Clearly x £ I = [0, 1]. Now if / can be written as a list {xi, X2, xs, . . . }, 
then there must be some n such that x = x„. But the first decimal place of x differs from the first decimal 
place of X, since oi ^ xn; hence x ^ x\. Similarly, the second decimal place of x differs from the second 
decimal place of X2^ since 02 7^ X22; hence x ^ X2- We can continue in this way to show that x ^ Xn for any 
n e N, i.e. x is not on the list {xi, X2,X3, . . .}. 

Given any list of real numbers in [0, 1], this technique allows us to produce real number x 

that is not on the list. It thus follows that there can be no list containing all the real numbers in [0, 1], i.e. 
there is no bijection from N to [0, 1]. 

□ 

Hence there are uncountable sets. Clearly M is also uncountable, because otherwise we 
could find an enumeration {ri, r2, ra, . . . } of M. By omitting any reals which are not in [0, 1], 
we could prune this into an enumeration of [0,1] — and such an enumeration does not exist. 

Here is another way of producing uncountable sets: 

Exercise 2.1.8 (a) Suppose that yl is a set. Show that there is no bijection A ViA). Conclude that 
is strictly larger than A. 

[Hint: Given f : A V{A), define B := {a £ A : a ^ /(o)}. Then B £ V{A), but B ^ /(o) for any 
o € A.] 

(b) Hence show that V{N) and 2*^ are uncountable sets. 

□ 



Proposition 2.1.9 (a) If A is countable, and if B is a subset of A, then B is countable. 

(b) If A, B are countable, then Ax B is countable. 

(c) If A,B are countable, the AU B is countable. 

(d) IfA={An:neN} is a family of countable sets, then [J^ A^i is countable. 



Proof: (a) If {a„ : n £ N} is an enumeration of A, we can obtain an enumeration of B 
by pruning the elements of A which are not in B. This can be accomplished inductively as 
follows. Let hi = an-, where n is the least positive integer such that an & B. Suppose now 
that bm has been defined and that bm = cii- Then let bm+i = ctj, where j is the least positive 
integer > i such that aj & B. Clearly : m G N} is an enumeration of B. 



Basic Notions of Topology 



21 



(b) One can easily prove that N x N is countable by copying Example 2.1.3(c). Just form an 

array 

(1,1) (2,1) (3,1) (4,1) ... 
1,2) (2,2) (3,2) (4,2) ... 
(1,3) (2,3) (3,3) (4,3) ... 



and zig-zag your way across this array. Let A — ^ N and B — ^ N be bijections. Then 
the map h : A x B — > N x N defined by h{a, b) = {f{a),g{b)) is clearly a bijection. Hence 
|yl X B| = |N X Nj = |N| as required. 

(c) follows from (d). 

(d) Again we use a zig-zag: Let {a„i, an2, OnSj . • • } be a listing of the elements of An- Form 
an array 

ail ai2 oi3 • • • 

0.21 0,22 023 • • • 
(131 (132 (t33 • • • 



and take a path which goes through each element once, pruning duplications. 

H 

Remarks 2.1.10 This proposition shows that you can't make uncountable sets using finite products and 
countable unions. You can, however, make uncountable sets using infinite products and the powerset operation. 
In Exercise 2.1.8 it is shown that if A is infinite, then ViA) is uncountable. Similarly if / is infinite, and l^i] > 2 
for all i € /, then Y\j Ai is uncountable. We shall not need these facts. Refer to any introductory book on set 
theory for proofs. 

□ 



2.2 Open Sets and the Interior Operation 

The aim of the next two sections is to create a new language for talking about space. We are 
going to define a large number of concepts, and prove a large number of simple propositions. 
We will consider mainly the metric spaces, which include the normed- and inner product 
spaces as subclasses. To build intuition, you should consider the cuclidean spaces, and draw 
pictures illustrating each concept where possible. All of the propositions in this section are 
trivial, in that they only require one to plug in the appropriate definitions to prove them. 
But those definitions take some getting used to. It is therefore extremely important that you 
do all the exercises — perhaps several times over! There is no other way to learn this new 
language. 



22 



Open Sets and the Interior Operation 



Definition 2.2.1 Let {X, d) be a metric space. 

• Let xq ^ X and r > 0. The open hall of radius r centered at xq is the set 

B{xQ,r) := {x ^ X : d{xo,x) < r} 

• Let ^ C X. A point xq is called an interior point of A if and only if there is r > 
such that B{xo,r) C A. In that case, we say that ^ is a neighbourhood of xq 

• If A C X we define the interior of A by 

A° := {x £ X : x is an. interior point of A} 

• A subset U C X is said to be open if and only if every point in U is an interior point 
of U, i.e. iff ?7 is a neighbourhood of all its points. 

Exercise 2.2.2 Let {X,d) be R with the usual metric: 

(a) Show that every open ball is an open interval, and vice versa. 

(b) Find an open set which is not an open interval. 

(c) Show that [0, 1]° = (0, 1). Thus [0, 1] is not an open set. 

(d) Show that Q° = 0. 

□ 

Here are some properties of the collection of open sets: 



Proposition 2.2.3 Let {X, d) be a metric space. 

(a) Each open ball is an open set. 

(b) A set is open if and only if it is a (possibly infinite) union of open balls. 

(c) The family of open sets satisfies the following axioms: 
(T.l) X and are open. 

(T.2) The union of any (possibly infinite) collection of open sets is open. 
(T.3) The intersection any finite collection of open sets is open. 

Exercise 2.2.4 Prove Propn. 2.2.3. Here are some hints: 

(a) Given x € B{xo,r), you must find an s > such that B{x,s) C B{xo,r). Explain why you can find a 
e > sufficiently small so that d{xo,x) + e < r. Then use the A-inequality to show that if j/ £ B(x,e), 
then d{xo,y) < d{xo,x) + d{x,y) < r. Conclude that B{x,e) C B{xo,r). 

(b) Suppose that A is open. Explain why for each a € A we can find ra > such that B(a, ra) C A. Now 

show A = UaSA ^i'^' 

Conversely, suppose that A = Uig/ B{xi,ri), and that a & A. Then there is i G 7 such that a € B{xi,ri). 
Use (a) to a is conclude that a is an interior point of A. 

(c) (T.l) If A is not open, then there is x £ A such that, for all r > we have B{x,r) g A. In particular, 
there is X S A, i.e. A is non-empty. 

(T.3) li Ui, . . . ,Un are open, and x £ UiD - ■ -nUn, then there exist ri , . . . , r„ > such that B{x, n) C Ui 
for i = 1, . . . , n. Take r = minjn , . . . , rn} and explain why B{x, r) C Ui f] ■ ■ ■ DUn- 



Basic Notions of Topology 



23 



□ 

Remarks* 2.2.5 A topological space is a pair {X,T) where X is a set and T is a family of subsets of X 
with the following properties: 

(T.l) X,0 e T 

(T.2) UUieT for i e /, then U^e/ Ui G T. 

(T.3) If t/i, . . . , f/„ e T for some n € N, then [/i n • • • n t/„ £ T. 

The elements of T are called the open sets of the topological space X. Propn. 2.2.3(c) shows that any metric 

space is a topological space. The notion of topological space is more general (and more abstract) than that of 
metric space. In this course, we shall not need this level of generality, however. 

□ 

Exercise* 2.2.6 In the above, the notion of open set depends on that of open ball, and that, in turn, 
depends on the notion of metric. However, it is possible for two distinct metrics to yield the same family of 
open sets, as we shall now show. Let d, di be two metrics on R", where d is induced by the usual (euclidean) 
norm || • ||, and di is induced by the || • ||i-norm, given by ||x||i := |a;i| + • • • + \xn\- 

(a) For the case n = 2, draw the open balls B{0, 1) w.r.t both metrics d, di. Note that the first ball is actually 
a ball (i.e. round), whereas the second "ball" is diamond-shaped. 

(b) Explain why every d-ball is a di-open set. 

(c) Explain why every di-ball is a d-open set. 

(d) Conclude that every d-open set is a di-open set and vice versa. 

The metrics d, di on R" are said to be equivalent: They generate the same family of open sets, and thus the 
same topology. 

□ 

Here are some properties of the interior operation: 

Proposition 2.2.7 Let {X,d) be a metric space, 

(a) A° C A. 

(h) A is open iff A° = A. 

(c) A^B implies A° ^ B° . 

(d) A°° = A°. 

(e) {A^Bf = A° f^B°. 

(f) A° is the union of all open subsets of A. 

(g) A° is the largest open subset of A. 

Exercise 2.2.8 (a) Prove Propn. 2.2.7. 

(b) Show that {A U B)° D A° VJ B° , and find an example to show that we do not generally have {A U B)° = 
A°\JB°. 

(c) Find an example to show that generally A° C B° need not imply AC. B. 

□ 



24 



Closed Sets and the Closure Operation 



2.3 Closed Sets and the Closure Operation 

The notion of closed set is complementary to that of open set. It is therefore formally super- 
fluous, but many results are can be more intuitively comprehended when stated in terms of 
closed sets, rather than open sets. 

Definition 2.3.1 Let {X,d) be a metric space. A subset C C X is said to be closed if 
and only if its complement := X — C is open. 

Exercise 2.3.2 (a) Let {X,d) be R with the usual metric. 

(i) Show that closed intervals are closed sets. 

(ii) Show that (0, 1] is neither open, nor closed. 

(b) Use de Morgan's laws to show the analogues of (T.1)-(T.3) for closed sets: 

(i) X and are closed. 

(ii) The intersection of a (possibly infinite) collection of closed sets is closed. 

(iii) The union of finitely many closed sets is closed. 

(c) Show that the intersection of infinitely many open sets need not be open, and that the union of infinitely 
many closed sets need not be closed. 

□ 

Exercise 2.3.3 Show that there may be sets which arc neither open nor closed. In particular, find a subset 
of K (equipped with the usual metric) which is neither open nor closed. 

□ 

We now show that closed means closed under limits: 

Proposition 2.3.4 Suppose that {X,d) is a metric space, and that C C X. Then the 
following are equivalent: 

(i) C is closed. 

(ii) Whenever {cn)n is a sequence in C which converges, then it converges to a point in 
C, i.e. 

Cii e C and r„ — x implies x £ (' 

Proof: (=^) Suppose C is closed in X, and that {cn)n is a sequence in C which converges. 
We must show x E C, and we argue by contradiction: li x ^ C, then x G C"^. 
Since C is closed, C"^ is open, so there is r > so that B{x, r) C C"^. Since c„ x, we 
have d{cn,x) < r eventually (i.e. 3N G N Vn > [d{cn,x) < r]). Then c„ G B{x,r) C C"^ 
eventually, contradicting the assumption that c„ G C for all n. 

We prove the contrapositive, i.e. we show that -i(i) implies -i(ii): Suppose that C is not 
closed, i.e. that is not open. Then there is a point x in C"^ which is not an interior point 
of C^. Thus for every r > we have B(x,r) ^ C^. Let r„ > be real numbers such that 
'"n ^ (e.g. define := ^). Since B{x,rn) 2 C*^; we must have B{x,rn) n C 7^ 0. Choose 
therefore, Cn G B{x,rn) n C. Then d{cn,x) < rn, so Cn ^ x, yet x C. 



Basic Notions of Topology 



25 



Definition 2.3.5 Let {X, d) be a metric space, and let A<^ X. 

• A point x G X is said to be a cluster point"" of A iff for every r > there exists 
y G B{x^ r)f\ A such that y ^ x. 

• A point X G X is said to be a boundary point of A if and only if for every r > 0, both 

5(a;,r)n74/0 and B(a;, r) n A" / 
The set of all boundary points of A is called the boundary of A, and denoted by dA. 

• The closure of A is defined to be the set A together with all its cluster points, i.e. 

A := {x ^ X : X ^ A oi X \s a, cluster point of A} 
"Cluster points are also called limit points or accumulation points in the literature. 

Exercise 2.3.6 (a) Find all the cluster points of the following subsets of K (equipped with the usual 
metric): 

^-(0,1) B-[0,1] C:=(0,l)n{2} D:={i:neN} := Q 

(b) Find the boundaries dA^ . . . , dE of the sets A, . . . above. 

(c) Find the closures A, . . . ,E oi the sets A, . . . ,E above. 

□ 

Remarks 2.3.7 A point x may be a cluster point of a set A without actually belonging to A. The same 
goes for boundary points. (Of course, if x is an interior point of A, we must have a; € j4). 

□ 

Here are some exercises on cluster points, boundaries and closures: 

Exercise 2.3.8 (a) Show that if A C B, then A (IB. 

(b) Show that A and A^ have the same boundary, i.e. that dA = dA'^ . 

(c) Show that a; is a cluster point of A iff for every r > the set B{x, r) n A is an infinite set. 

[Hints: (c) Suppose that a; is a cluster point of ^4. If _B(a::, r) is finite, let {j/i, . . . , ym} bo the set of all points 
in B(x, r) n A such that j/j =^ x. Now choose s so that < s < mini<j<TO d{yi,x) and consider B{x, s) D A.] 

□ 

Here follows intuitive characterization of open and closed sets in terms of their boundaries: 

Proposition 2.3.9 Let {X, d) be a metric space, 
(i) U C. X is open iffU contains none of its boundary points, i.e. 

U is open iff UndU = ^ 

(a) C C. X is closed iff C contains all of its boundary points, i.e. 

C is closed iff dC C C 



26 



Closed Sets and the Closure Operation 



Proof: (i) (^): If U is open and x ^ U, then there is r > such that B{x,r) C U. Thus 
B{x, r) nW^ = 0, which impHes that x dU. Hence U fl dU = when U is open. 

(i) (<^=): Suppose that U fl dU = 0. U x e U, then a; 5?/. Hence there is r > such that 
B{x,r) n J/*^ = 0, from which it follows that B{x,r) C U. Hence x is an interior point of U. 
Since x was arbitrary, every point of U is an interior point of U, and so U is open. 

(ii) : We use (i) and the fact that d{C'^) = dC to argue that 







C is closed 








is open 








n d{C^) = 








n = 








dCCC 




The closure of a set 


can also be characterized in terms of boundary points: 




Proposition 2.3.10 












A = A\JdA 





Proof: First note that 

A- A = dA- A 



i.e. that if x ^ A, then x is a boundary point if and only if x is a cluster point of A: For if 
X G dA — A, then each set B{x, r) n A contains some point, and that point cannot be x, as 
X ^ A. Thus X is a clustcrpoint of A, i.e. x £ A — A. 

Conversely, x £ A — A, then x G B(x, r) Ci A^ for all r > 0, and hence both -B(x, r) n A'^ 
and B{x, r) fl ^4 are non-empty, as x is a cluster point of A. Hence x is a boundary point of 
A, i.e. X edA- A. 

The result is now obvious: Since C ^ we have A = A\J{A-A) = A\j{dA-A) = AiJdA. 

H 

The notions of cluster point and closure can seem difficult at first contact, but they're 
very important. The next few results may result in you pulling out your hair (but it will grow 
back, so push on) . 

Proposition 2.3.11 Let {X, d) be a metric space, and let AC. X. Then the following are 
equivalent: 

(i) X e A 

(ii) There exists a sequence {an)n in A such that an — > x. 
(Hi) For every r > 0, we have B{x, r) fl A 7^ 0. 

Proof: (i) ^ (ii): Suppose that x G ^. Then either x G ^ or x is a cluster point of A. If x 
is a cluster point of A is a cluster point of A, choose a„ G B{x, ^) fl A. Else, if x G A, define 
an '■= X for all n. In either case, we gave an € A for all n, and a„ x. 

(ii) ^ (iii) : Suppose that (a„) is a sequence in A such that a„ x, and let r > be 



Basic Notions of Topology 



27 



arbitrary. Then a„ G B(x, r) eventually, and so B{x, r) f] A ^ 9. 

(iii) ^ (i): Suppose that for all r > we have B{x, r) n A / 0. If x G A, then certainly x E A. 
H X ^ A, then (by the definition) x is a cluster point of A, and hence again we conclude that 
x e A. 

H 

Proposition 2.3.12 Let {X, d) be a metric space, and let A C X. Then A is the smallest 

closed set which contains A. 

In particular, A is closed iff A = A. 

Exercise 2.3.13 We prove Propn. 2.3.12. Let {X,d) be a metric space, and let AC X. 

(a) Wc first show, by contradiction, that if B{x, r) = for x G X and r > 0, then the (ostensibly larger) set 
B{x, r) n ^ is empty as well. Assume, therefore, that B{x, r) n ^ = but that there exists y € B{x, r) n A. 

(a.l) Explain why jy is a cluster point of A. [Look at the definition of A.] 

(a.2) Explain why there is s > such that B{y, s) C B{x, r), and conclude that B{y, s) f] A = 0. 
(a.3) Explain why this is a contradiction. 

(b) Next, we show that A is a closed set, i.e. that (Ay is open. 

(b.l) Let X G (Ay. Use Propn. 2.3.11 to conlude that there is r > such that B{x, r) D A = 9. 
(b.2) Now use (a) to conclude that B(x, r) C {A)". 

(b.3) Explain why this is a contradiction, and conclude that A is a closed set. 

(c) Finally, we show that A is the smallest closed set which contains A, i.e. that if C is a closed set such that 
C A, then C A as well. Assume, therefore, that C is a closed set such that C 3 A. 

(c.l) Let X £ Ahc arbitrary. Explain why there is a sequence {an)n in A such that a„ — > x. 
(c.2) Use Propn. 2.3.4 to conclude that x G C. 
(c.3) Conclude that AcC. 



□ 



2.4 Compact Spaces and Sets 

Compactness is one of the most important notions in analysis and topology. Yet it is very 
difficult to explain where the definition comes from. In some ways, compactness as a gen- 
eralization of finiteness. Because the notion is so unfamiliar, we will define two notions of 
compactness, one in terms of open sets, and one in terms of sequences. We will then show 
that the two notions coincide in metric spaces"^. It will also transpire that the compact sets 
in M"" (with the usual metric) have a simple characterization: They are precisely the closed 
and bounded sets. 

Here is our first definition of compactness — in terms of open sets: 



^TThey need not coincide in more general topological spaces, though 



28 



Compact Spaces and Sets 



Definition 2.4.1 Let {X, d) be a metric space, and let AC X 

• An open cover of A is a family {Ui : i E 1} of open sets in X such that 

Ac\JUi 
I 

• The set A is compact if every open cover of A has a finite subcover (i.e. if whenever 
lA = {Ui : i £ 1} is an open cover of A, there exists a finite subfamily Ui^, . . . , Ui^ G U 
such that A C {Jl^i Ui,.). 

Examples 2.4.2 (a) Every finite subset of a metric space {X, d) is compact — why? 

(b) The space K" (with the usual metric) is not compact. For example, if U„ = B{0,n), then {Un : n £ N} is 
an open cover of R" . Yet it clearly has no finite subcover — why not? The same argument shows that no 
unbounded subset of can be compact, i.e. compact subsets of are necessarily bounded. 

(c) No open interval (o, b) is compact in R: Let Un = {a + — ^). Then clearly (a, b) = |J„ Un (i.e. {U„}n 
is an open cover of (o, 6)). Yet {Un}n clearly has no finite subcover of (a, 6) — why not? 

□ 

The following exercise shows that there are infinite compact sets in M: 

Exercise 2.4.3 We prove that the closed unit interval [0, 1] is a compact subset of R. 

(a) Let / = [0, 1] bo the closed unit interval, and let U = {U~, : 7 G F} be an open cover of I. Define /* to be 
the set of all those x £ I for which [0, x] can be covered by a finite subfamily of U: 

r = {xe [0, 1] : 371, ... ,7^ € F ([0,a;] QU^,U---U Uj^)} 

(b) Explain why £ J* . 

(c) Show that /* is a subinterval of /: If a; £ I* and < y < x, then y £ I* . 

(d) Define x* = sup/*. Explain why < a;* < 1. 

(e) Explain why there is 7* £ F such that x* £ U-^* . 

(f) Explain why x* el*. 

(g) Assume now that x* < 1. Explain why there is e > such that [x* — e,x* + e] C ZXy* PI [0, 1]. 

(h) Explain why [0, x* + e] can be covered by a finite subfamily of W. 

(i) Conclude that x* + e £ I* . 

(j) Explain why this is a contradiction. 

(k) Deduce that 1 £ /*, and thus that I can be covered by a finite subfamily of W. 

□ 

The above exercise can easily be generalized to show that any closed interval [a, b] in M is 
compact. 

We would like to give some other, perhaps more user friendly, criteria for compactness 
in metric spaces. We have seen that every compact subset of M is necessarily bounded. But 

in R, the (finite) closed intervals arc compact, whereas the open intervals (a, 6) are not. 
Boundedness is a requirement, but it is not sufficient: 

Exercise 2.4.4 A subset ^ of a metric space {X,d) is said to be bounded if and only if there exists an 
open ball B{x, r) such that A C B(x, r). 



Basic Notions of Topology 



29 



(a) Show that the union of two (and hence of finitely many) bounded sets is bounded. 

(b) Show that a compact set is bounded. 

(c) Show that a bounded set need not bo compact. 

[Hints: (a) If ^ C B(xi, ri), B C B(x2, r-z) arc bounded, show that A\J B B{xi, R), for any 
R > max{rf(a;i, X2) + r2,ri}. 

(b) Note that if ^ C X, then, for any r > 0, we have A C IJ^^^ B(a;, r). If A is compact, then there are 
xi, . . . ,x„ € A such that A C |Jfc=i B{xk,r). Now use (a).] 

□ 



Proposition 2.4.5 (a) A compact subset of a metric space is closed and hounded, 
(b) A closed subset of a compact set is compact. 



Proof: (a) Suppose that K is compact. From Exercise 2.4.4, we know that K is bounded. 
To prove K is closed, we need only show that its complement K'^ is open. Fix xq G K'^. For 
each y & K, let rj^ G M so that < Tj, < ^d{xo, y). Define 

Uy := B{XO, Ty) Vy 1= B {y , Ty) 

By construction, we have xq e Uy,y e Vy and Uy U Vy = $. Note that K C [jy^j^Vy 
(because if y (£ K, then y G Vy), and thus by compactness there are yi, . . . ,y„ G K so that 
K CVy^U---UVy^:= V. Now define U := Uy^ n ■ ■ ■ n Uy^. Then: 

(i) U is open. 

(ii) XQ G U. 

(iii) U n V = i/}, for ii z & V, then z G Vy. for some 1 < j < n, so that z Uy., and hence 
z^U. 

(iv) U C K"", because U DK CU nV = 9. (Recall that A C B"" ^ An B = (/}). 

We have thus found an open set U so that xq £ U CI K'^. Hence xq is an interior point of K'^. 
Since xq was arbitrary, all points of are interior points, i.e. is open, 
(b) Suppose that if is a compact subset of a metric space {X,d), and that C Q K is closed. 
Let U := {Ui : i G /} be an open cover of C. Since C is closed, V := C"^ is open. Now note 
that if a; G if, then either a; G C or x G C^. In the first case, x G |Jj Ui (because U covers 
C), whereas in the second x £ V . In either case x G |Jj UiUV for any x G K. It follows 
that U U {V} is an open cover of K. As K is compact, there exist ii,. . . ,im G i so that 
K CUi^U ■■■UUi^UV. As F = C, we see that C C ?7ii U • • • U Ui^, i.e. we have found a 
finite subcover of U for C. 

H 

We can say even more: 



Proposition 2.4.6 If a subset K of a metric space is compact, then it is complete, i.e. 
every Cauchy sequence in K converges, to a limit in K. 



30 



Compact Spaces and Sets 



Proof: Suppose that {xn)n is a Cauchy sequence in K. Then for each m G N, there exists 
Nm G N so that d{xn,XNm) < ^ whenever n > Nm- Define open subsets 

Gm:= {yeX:diy,XNj>^} 

and note that Xn ^ Gm when n > A^^. It follows that no finite family of G^'s covers K: 
For if mi, . . . , G N, we have that a;„ Uy=i ^m^. as soon as n > max{A^mi , • • • , -^m^}- 
Because K is compact, it follows that [j^^^ Gm 2 -^j thus that there is x £ K — Gm- 
We claim that the sequence {xn)n converges to x: If e > 0, choose m G N so that ^ < s. 
Since x Gm. we have d{x,XNm) < and thus if n > Nm, we have 

d{x,Xn) < d{x,XNm) + d{Xn,XNm) < ^ < ^ 

Hence x„ — ^ x, where x E K. 

H 

We are now ready to introduce our second notion of compactness, in terms of sequences: 

Definition 2.4.7 A subset K of a metric space {X, d) is said to be sequentially compact 
if and only if every sequence in K has a subsequence which converges to some element of 



Example 2.4.8 Any closed and bounded interval [a, h] is sequentially compact in R. For suppose that 
{xn)n is a sequence in [a, b]. Then it has a subsequence which converges to x := limsup„ Xn- Clearly x € [a, b], 
and so {xn)n has a subsequence which converges to a point in [a,b]. 

□ 

We are now ready to prove the promised equivalence: 

Theorem 2.4.9 Suppose that {X,d) is a metric space and that K C. X. The following 
are equivalent 

(a) K is compact, i.e. every open cover of K has a finite subcover. 

(b) Every infinite subset of K has a cluster point, which belongs to K. 

(c) K is sequentially compact, i.e. every sequence in K has a convergent subsequence 
whose limit is in K. 

Proof: (a) ^ (b): Suppose that A (1 K is infinite, and but that A has no cluster points. Since 
a set is closed iff it contains all its cluster points, any set without cluster points is necessarily 
closed. It follows that A is closed, and thus that V := A^ is open. Furthermore, if A has no 
cluster points, then no a G ^ is a cluster point of A, and hence for each a e A there is an 
open ball Ua := B{a, r^) so that UaH A = {a}. It is now easy to see that X = UaeA ^ci U V, 
and thus that K C IJ^g^ Ua U V. By compactness of K, there are ai, . . . , G A such that 
-4 C Uaj^U ■■■ UUa„UV, and thus that A C Ua^ U ■ ■ ■ U Ua„, as AnV = Bu then 
A = {Uai n ^) U • • • U {Ua„ n ^) = {ai} U ■ ■ ■ U {an} = {ai, a„}, i.e. A is finite — a 
contradiction. Hence A must have a cluster point. 

(b) ^ (c): Suppose that {xn)n is a sequence in K. If the set {xn '■ n G N} is finite (i.e. if 
there are only finitely many different values of the Xn), then {xn)n obviously has a convergent 



Basic Notions of Topology 



31 



subsequence because there must be x such that Xn = x for infinitely many n, and we can take 
the subsequence to consist of just those Xn — a constant subsequence. Suppose , therefore, 
that the range {x„ : n G N} is infinite. Then by (b) it has a clusterpoint x e K, and it is 
easy to see how to choose a subsequence which converges to x: Simply pick ni < n2 < ■ ■ ■ so 

that £ K n B{x, I), for all /c G N. 

(c) =^ (a): Let U be an open cover of the sequentially compact set K. We first show that 
with the following property: 

3r > OVx G K3U G U[B{x, r) C U] (*) 

If (*) fails, then there is, for each n G N, a a;„ G if so that B{xn, ^) ^ U for any U eU. By 

(c), the sequence {xn)n has a convergent subsequence x to some x £ K. Now x £ U 

for some U £ hi (because the U^s cover K and x G K)^ and hence there is s > so that 
B{x,s) C U. Now choose k sufficiently large that both d{xrn.,x) < | and ^ < §• Then 

1 s 

yG5(xn^,^) ^ d{y,x) < d{y,Xrn:) + d{xrn:,x) < ^ 7. < ^ ^ yeB{x,s) 

so that B{xn^., ^) B{x, s) C [/. This contradicts the definition of Xn^, and hence (*) holds. 

With (*) now proved, we can proceed: Choose r > so that for every x £ K there is 
U e U so that B{x, r) C U, and let yi £ K he arbitrary. If possible, choose inductively 
yn+i £ K so that d{yn+i,yj) > r for all j = 1, . . . , n. If this is possible for all n, one would 
obtain a sequence {yn)n hi K with d{yn,y„i) > r for all n,m, and such a sequence cannot 
have a convergent subsequence. Hence there is n for which it is impossible to choose yn+i, 
and hence K C IJj=i B{yj,r). By definition of r there exist i/j G W so that B{xj,r) C iJj for 
j = 1, . . . ,n. Thus K C Uj=i yields a finite subcover. 

H 

2.5 Compactness in 

It is easy to characterize compactness in Euclidean space M": The compact sets are precisely 
those which are closed and bounded. To prove this, we begin with a generalization of the 
Bolzano-Weierstrass Theorem (cf. Thm A. 3. 18) to higher (but finite) dimensions: 



Theorem 2.5.1 (Bolzano- Weierstrassj 

(a) Every bounded sequence in has a convergent subsequence. 

(b) Every infinite bounded subset ofofW^ has a cluster point. 



Exercise 2.5.2 We prove the Bolzano-Weierstrass Theorem for K''. The proof is straightforward, but 
we need to recall the Bolzano-Weierstrass Theorem for R^: Every bounded sequence in R has a convergent 
subsequence. 

(a) Suppose that (x„)„ is a bounded sequence in R"*, where 

(Here, xl^ does not mean x„ to the power i — it simply means the i**" component of x„.) We want to 
show that (x„)„ has a convergent subsequence 



32 



Continuity 



(a.l) Explain why there is a subsequence (a;^/)™' of the sequence (a;^)n which converges to some limit, 

(a. 2) Consider now the subsequence {x'^,)„> of (a;^)n, which consists of the same components n' as given 
in (b.l). Explain why (a;^/)„/ has a further subsequence {x^,,)n" which converges to some limit, 
x^„ a^. (Again, is not a squared, but the second component of a vector a that we will define 
below.) 

(a. 3) Also explain why also x]^,/ a^. 

(a.4) Next, consider the subsequence {a;^/,)„// of {x^)n which consists of the same components n" as in 
(b.l). Explain why {x^„)„" has a further subsequence {x^ii/)n"' which converges to some limit, 

xl,„^a^. 

(a.5) Also explain why also xj,//, — > and x^,,i — > a^. 

(a.6) We can proceed in this way, component by component, until we produce a subsequence (a;J^(d))„(<j) 
of {xi)n with the properties that 

112 2 d d 

x^(d) -» a , a;^(d) -> a , ... x^^^i) ->■ a 
Thus we have x^(d) ^ a in R'*, where a := (a^,a^, . . . ,a"). 
(b) Suppose that A C E'' is a bounded infinite set. Wc must show that A has a cluster point. 

b.l First explain why we may choose a sequence {x„)„ in A so that x„ ^ Xm when n ^ m. 
b.2 By (a), there is a subsequence (x„/)„/ which converges. Explain why x := lim„ x„ is a cluster point 
of A. 

□ 



Theorem 2.5.3 (Heine-Borel) A subset ofW^ is compact if and only if it is closed and 
bounded. 



Proof: (=>): This follows immediately from Propn. 2.4.5(b). 

(<^): Suppose that C C R'^ is closed and bounded, and let (x„)„ be a sequence in C. By the 
Bolzano-Weierstrass Theorem, (x„)„ has a convergent subsequence, x^' x. Because C is 
closed, we have x G C. Hence every sequence in C has a subsequence which converges to a 
point in C, i.e. C is sequentially compact. 



2.6 Convergence and Continuity 

In this section we take another look at continuity. To do that, we need a little set theory: 
2.6.1 Pull-backs and Push-forweirds 



Definition 2.6.1 Let X, Y be sets, and let / : X — >^ y be an arbitrary function. 

• If A C X, then the push-forward (or direct image) of A along / is the set f[A] C Y 
defined by 

f[A] := {fix) ■.xeA} = {yeY:3xeAiy = fix))} 

• If S C y then the pull-back (or inverse image) of B along / is the set f''^[B] C X 
defined by 

f-'[B] := {xeX:fix)&B} 

Thus 

x^f-\B] ^ fix)eB 



Basic Notions of Topology 



33 



Remarks 2.6.2 IMPORTANT: The definition of the inverse image does not require the existence of an 
inverse function, i.e. it is not necessary that the function exists'^ in the definition of inverse image. Observe 
that the definition of f~^[B], namely {x £ X : f{x) € B} involves only the function / (and does not involve 
f~^) and is therefore well-defined if / is. 

When f : X —^Y is invertible, however, then the inverse image of B C y along / and the direct image of 
B along : F — > X are easily seen to coincide. 

□ 

Exercises 2.6.3 (a) Let X -.= {0, 1, 2, 3, 4}, Y := {a, b, c, d} and define f : X ^ Y hy 

Oh- »o, li— »d, 2i— »-c 3i— »o, 4i— >c 
Determine the direct images of the following sets: 

Ai := {0} A2 := {0, 1} Aa-.^X = 

Determine the inverse images of the following sets: 

Si := {a} B2 = {6} B3 = {a, b, c}, B4 := 

(b) Let p : ^ K : {x, y)^ x^ +y^. Determine 

g-^{l} 9-\Q,l] ff-'(-oo,0) 

(c) Let /i : R — > R : X H- > a;^. Determine 

h-\l,2] h[l,2] 

□ 

Note that if / : X — s- F, then /[•] is a function which assigns to every subset of X a subset 
of y, i.e. 

/[•] : V{X) ^ V{Y) : A C X ^ f[A] C Y 
Similarly, /~^[-] is a function which assigns to every subset of y a subset of X, i.e. 

Lemma 2.6.4 If f : X ^ Y , A (Z X and B (ZY , then 

/[A] eg iff A<Zf-\B] 

Proof: (=^>): Suppose that f[A] C B. U x E A, then /(x) G f[A]. Hence /(x) G B, and thus 
X e f~^[B]. Thus X e A implies x G f~^[B]. 

(<^): Suppose that A C f~^[B]. If y G f[A], then there is x G ^ such that f{x) = y. But 
X e A implies f{x) G B, and hence y E B. Thus y G f[A] implies y E B. 

H 

The pull-back function /~^[-] is particularly well-behaved with respect to the algebraic 
set operations. : 

^Recall that exists iff / is a bijection i.e. both one-to-one and onto. 



34 



Continuity 



Proposition 2.6.5 Suppose that f : X is a function, and that C y for z G /. 

(a) r'u^iB,] = [j,^jr'[B,]. 

(b) f-'n^jB,] = f].^,f-^[B,]. 

(c) f-'^iB]" = f-^iB"]. (Here /-^[B^ is the complement of f~^[B] in X, whereas 5^ is 
the complement of B in F.) 



Proof: (a) 



iff fix)E(jBi 

i 

iff (3z G I){f{x) G Bi) 
iff (3iG/)(xG/-^[B,]) 

iff xe(jf-'[Bi] 

i 

H 

Exercise 2.6.6 (a) Prove the remainder of Propn. 2.6.5. 

(b) Investigate the analogous results for the push-forward function /[•]. 

□ 

2.6.2 Topological Characterizations of Convergence and Continuity 

We are now able to characterize notions related to convergence and continuity purely in terms 
of neighbourhoods and open sets. First, we show how to define the notion of convergence: 



Proposition 2.6.7 Let {xn)n be a sequence in a metric space {X, d) and let x E X. Then 
the following are equivalent: 

^i) Xji > X. 

(a) For any neigbourhood A of x, we have Xn & A eventually. 



Proof: (i) ^ (ii): Suppose that x„ x, and that ^ is a neighbourhood of x. Choose r > 
so that B{x,r) C A, and then choose N such that d{xn,x) < r whenever n > N. Then 

Xn ^ A whenever n > N. 

(ii) =^ (i): Let e > 0. Then B{x,e) is a neighbourhood of x. Hence there exists iV G N such 
that Xn G B{x,s) whenever n> N. Thus d{xn,x) < e whenever n> N. 

H 

Theorem 2.6.8 Let {X,dx) and (F, dy) be metric spaces, and let f : X ^Y. 

(a) f is continuous at xq £ X iff for every (open) neighbourhood V of f{xo) there exists 
an (open) neighbourhood U of xq such that f[U] C V. 

(b) f is everywhere continuous iff whenever V is an open subset ofY, then f~^\y\ is an 
open subset of X. 



Basic Notions of Topology 



35 



Proof: (a): Suppose that f : X ^ Y is continuous at the point xq G X, and let V be 
a neighbourhood of /(a;o) € Y. By defnition of neighbourhood there is e > such that 
i?y(/(xo),e) C V. Because / is continuous, there is J > such that dx{xo,x) < 6 imphes 
dyifixo), f{x)) < e. Thus U := Bx{xo, S) is an open neighbourhood of xq with the property 
that f[U] C V. 

Conversely, suppose that for every (open) neighbourhood V of /(xq) there exists an (open) 
neighbourhood U of a^o such that f[U] C V. To prove that / is continuous at Xo it suffices to 
show that whenever x„ — xq, we must have f{xn) — f{xo)- So suppose that Xn — xq in X. 
Let V be an arbitrary neighbourhood of /(a;o),and choose a neighbourhood U of xq such that 
f[U] C 1/. Since Xn & U eventually by Propn. 2.6.7, we must have f{xn) £ V eventually. It 
follows from Propn. 2.6.7 that f{xn) —>■ f{xo)- 

(b): Suppose that / is continuous, and that V C Y is open. If xq G /~^[^]) then V 
is an open neighbourhood of /(xq), and so there is an open neighbourhood U of xq such 
that f[U] C V. Then xq G U C f-^[V], and hence Xq is an interior point of / ^[V]. Since 
Xo G /~^[^] was arbitrary, every point of is an interior point, i.e. /~^[^] is open. 

Conversely, suppose that puUbacks of open sets are open. If xq G and V is an open 
neighbourhood of /(xq), then U := f~^[V] is an open neighbourhood of xq with the property 
that f[U] C V. By (a), / is continuous at any xq G X. 

H 

2.6.3 Continuity and Compactness 

The following result has many important applications: 

Theorem 2.6.9 /// : {X,dx) — ^ (Yjdy) is continuous and K is a compact subset of X, 
then f[K] is a compact subset ofY. 

We will give two proofs, one in terms of sequences, and one in terms of open covers: 

Exercise 2.6.10 Use the equivalence of compactness and sequential compactness to give a proof of Thm. 
2.6.9, i.e. prove that if / is continuous and K sequentially compact, then f[K] is sequentially compact. 

□ 

Alternate Proof of Thm. 2.6.9: Let V := {Vi : i G /} be an open cover of f[K] in Y. We 
show that there exist finitely many zi, . . . G / such that f[K] C V^^ U ■ • • U Vi^. Define 
Ui := /~^[Vi] for i e I, and observe that each C/j is open, by Propn. 2.6.8. Moreover 

xeK /(x) G f[K] ^ 3i G /(/(x) G 3i G I{x G f-\Vi\) x &[jUi 

Hence U := {Ui : i G /} is an open cover of K. As K is compact, there exist ii, . . . ,im & I 
such that KCUuU---UUi^. Now f[K] C f[[jf^, [/,.] = f[f-'[[j%, C U7=i V^^. 

H 

The above theorem has the following important consequences for optimization. It states that 
a continuous function possesses a maximum and minimum on any compact set: 



36 



Separable Spaces 



Corollary 2.6.11 Suppose that f : {X,dx) ^ M is continuous, and that K is a compact 
subset of X. Then f attains its supremum and infimum on K, i.e. there exist k* & K 
such that 

f{k*) = sup f[K] = sup f{k) f{k.) = inf f[K] = inf f{k) 



Proof: C := f[K] is compact, and hence closed and bounded. From the fact that C is 
bounded, it follows that supC and inf C exist and are finite (by the Completeness Axiom). 
Since C is closed, we must have sup C, inf C e C. (To see, e.g., that c* := supC G C, 
note that for every n G N there is c„ G C such that c* — ^ < c„ < c*, so that (c„)„ is a 
sequence in C with c„ — > c*. Closedness of C now guarantees that c* G C.) In particular, 
sup f[K] G f[K], i.e. there is k* e K so that sup /[if] = f{k*). The same holds for mi f[K]. 

H 

Finally, we show that continuous functions on compact sets possess a stronger property, 
that of uniform continuity: 

Definition 2.6.12 Let {X , dx) , {Y, dy) be metric spaces and let ^4 C X. A function 
f : A is said to be uniformly continuous on A if and only if 

ye> 036 > OVx,y G A[dx{x,y) < 6 ^ dy {f jx) , f jy)) < s] 

Remarks 2.6.13 Note the difference between the definition of continuity and uniform continuity: f : 
{X, dx) — > {Y, dy) is continuous iff it is continuous at every x € X, i.e. iff 

Vx G XVe > 35 > OVy 6 X[dx{x, y) < 3 ^ dyifix), f{y)) < e] 

Thus given an x € X and a e, we can find a d, and that S may depend on both e and x. 
On the other hand, / : {X,dx) — »■ (y,dy) is uniformly continuous iff 

Ve > 35 > OVx,y G X[dx{x,y) < S ^ dy (/(x), /(y)) < e] 

Thus given a e we can find a 6, and that 6 will work for uniformly for all x. It does not make sense to talk 
about uniform continuity at a point — we are discussing all x in a given domain at once. 

□ 

Exercise 2.6.14 (a) Show that if / is uniformly continuous, then it is continuous at any point. 

(b) Show that any affine function / :R— >R:a;i— >aa; + 6is uniformly continuous on all of R. 

(c) Show that the function y : K ^ E : x i-^ is not uniformly continuous on R, but that it is uniformly 
continuous on [0, 1] (or, indeed, any bounded subset of E). 

(d) Show that the function h : (0, oo) — > R : x i— > ^ is uniformly continuous on (r, oo) for any r > 0, but not 
uniformly continuous on (0, oo). 

[Hints: (c) Given S > 0, note that |(x + Sf - x^| > 2c5|x| can be made arbitrarily large by choosing 
X sufficiently large. Conclude that, with e = 1 (or any other value > 0), there is no 5 > such that 
|p(y) — g[x)\ < e for all x, y with |x — j/| < 5.] 



□ 



Basic Notions of Topology 



37 



2.7 Separable Spaces 

Intuitively, a subset D of a metric space {X, d) is dense when it is "spread everywhere" within 
the space X. This does not mean that D = X. For example, the set Q of rational numbers 
is spread every where within the space M of reals: Between any two real numbers one can 
find a rational number. As we have seen, M is much bigger that Q, i.e. Q is quite "small" — 
Q is countable, M is not. The fact that every real number can be approximated arbitrarily 
closely by rational numbers can be very useful, as we need deal only with a much smaller set. 
It deserves to be formalized: 

Definition 2.7.1 Let {X,d) be a metric space. A subset D C. X is said to be dense in X 
if 

n [/ 7^ for every non-empty open set U 
A metric space is said to be separable if it has a countable dense subset. 



Examples 2.7.2 (a) K'' is separable, as Q"* = Q x • • • x Q is countable, by Propn. 2.1.9. 

(b) Every compact space is separable: For suppose {X, d) is compact. Then, for every n € N, there exist 
finitely many open balls of radius ^ which cover X. Let D„ be the set of centers of those finitely many 
open balls, and let D := [J^Dn- Then D is a countable union of finite sets, and hence countable, by 
Propn. 2.1.9. We claim that D is dense. For suppose that ?7 C X is non-empty and open. Let x G U, and 
choose n e N so that B{x, ^) C U. As Uygo^ B{y, ^) covers X, there is d e Z)„ such that x € B{d, ^). 
Clearly, then d € B{x, so that d€ Dn B(x, ^) CDnU. Hence D n [7 ^ 0. 

□ 

2.8 The Banach Space C[a,b] 

We investigate, in this section, the space (C[a, 6] , 1 1 • | |oo of all continuous functionjs / : [a, b] 
M, equipped with the sup-norm: ||/||oo '■= ^^'Pxe[a,b]\f{^)\- We will first investigate the notion 
convergence in this space — uniform convergence — and show that it is a Banach space (i.e. a 
complete normed space) . Then we give a criterion for a subset of this space to be compact — 
the Arzela-Ascoli Theorem. Finally, we show that C[a, b] is separable: The set of polynom,ilas 
with rational coefficients is dense in C[a, 6]. 

2.8.1 Uniform Convergence 

Let {X, d) be a metric space, and let C{X) be the set of all continuous functions / : X — M. 
We will be interested mainly in the case where X = [a,b] is a compact interval in M. 
C{X) inherits a lot of structure from M. It is easy to see that 

• C{X) is a linear space, when addition and scalar multiplication are defined pointwise, 
i.e. 

(/ + g) (x) := fix) + g{x) (af) (x) -a-fix) for x e X, a G M 

• When X is compact, the function || • ||oo ^ C(^) ~^ ^ defined by 

ll/lloo := sup|/(x)| 



38 



The Banach Space C[a, b] 



is a norm on C{X). This norm is called the uniform norm or the sup norm. 

[Note that ||/||oo < oo when / is continuous, because X is compact — cf. Corrollary 

2.6.11.] 

The elements of C{X) arc functions, so two notions of convergence suggest themselves: 



Definition 2.8.1 Let /„, / G C{X) for neN. 

(a) We say that fn^f pointwise if and only if fn{x) — ^ f{x) for all x E K. 

(b) We say that fn^f uniformly if and only if it converges w.r.t. the uniform norm, i.e. 

ifF||/n-/||oo^O. 

□ 



Examples 2.8.2 (a) Suppose that /„ : [0, 1] ^ R : x i-» x{x + Then as n -> oo, we have /„(a;) -» x^. 
So if f(x) := x^, then fn—^f pointwise on [0, 1]. 

We also have |/n — /|oo = sup()<j.<]^ I f I = n I/" ~ f\oo — > 0. Hence we also have fn—>-f uniformly, 

(b) Let /n : [0, 1] -> R : a; -> x". Then 



lim/„(a;) = 



if < X < 1 

1 if a; = 1 



Hence 

where / : [0, 1] — »■ K is defined by 



fn—>^f pointwise on [0, 1] 
■'^ ' M if a: = 1 



On the other hand 



Hence fn 7^ f uniformly. 



||/n-/||oo= sup |a;"| = 1 for all n 

0<a!<l 



□ 



Exercise 2.8.3 For n € N, define /„ : [0, 1] ^ K by 

(a) Show that /„ converges pointwise, and determine the limit function / = lim„ /„. 

(b) Does /n — > / uniformly? 

□ 

How do pointwise and uniform convergence differ? Look at the e-forms of the definitions: 

^ / pointwise iff (Vx G X)(Ve > 0)(3Ar G N)(Vn G N)[n > ^ |/„(x) - < e] 
/n^/ uniformly iff (Ve > 0)(3Ar G N)((Vn G N)Vx G X)[n > AT ^ |/„(x) - < e] 

The difference between these notions of convergence is clear: The order of the quantifiers is 
different: In the case of pointwise convergence, for each e and each x, we can find an N such 
that. . . . That N may depend on e as well as x. 

In the case of uniform convergence, for each e we can find an such that. . . That N may 
depend on e, but not on a particular x: We can find an A'^ which works for all x simultaneously. 



Basic Notions of Topology 



39 



Remarks 2.8.4 When X is a compact subintcrval of E, the concept of uniform convergence has a neat 
graphical interpretation: fn^f uniformly if and only if the graphs of the lie eventually within a band of 
radius e about the graph of /, i.e. if and only if at most finitely many of the /„ have some portion of their 
graphs lying outside this band. 

□ 



Proposition 2.8.5 If fn ^ f uniformly, then fn^f pointwise. 
Exercise 2.8.6 Prove Propn. 2.8.5. 

□ 

In Examples 2.8.2 and Exercise 2.8.3 we saw that the pointwise hmit of continuous func- 
tions nee not be continuous. This "pathology" cannot happen with uniform convergence: 

Theorem 2.8.7 Suppose that fn^f uniformly on X. If each fn is continuous at xq G 

X, then f is also continuous at xq- 

Proof: Let e > 0. We must find a 5 > such that \ f{x) — f{xo)\ < s whenever d{x,xo) < S 
(for X e X). 

First, by uniform convergence, we may choose N such that \fn{x) — f{x)\ < | for all 
X € X, whenever n > N. 

Then, because /at is continuous at xq, we may choose 6 > such that |/Ar(a;) — /jv(a;o) | < | 
whenever d{x, xq) < 5. 

It follows that 

\f{x) - f{xo)\ < \f{x) - fN{x)\ + \fN{x) - fN{xo)\ + |/iv(xo) - f{xo)\ < | + | + | 

whenever d{x, xq) < 6. 

H 



Corollary 2.8.8 The uniform limit of continuous functions is a continuous function. 



Theorem 2.8.9 (Cauchy criterion for uniform convergence) 

Suppose that {fn)n is a sequence of functions with common domain X. There exists a 
function f such that fn^f uniformly on X if and only if 

(V£ > 0)(37V G N)(Vn,m > iV)(Vx G - < e] 

i.e. for every e > we can find an N such that whenever n,m > N and x E X, we have 
\fn{x) - fmix)\ < e. 

Exercise 2.8.10 Prove Theorem 2.8.9. 

□ 

We can now prove the following theorem: 

Theorem 2.8.11 // (X, c?) is a compact metric space, then {C{X), \\ ■ ||oo) is a Banach 
space. 



40 



The Banach Space C[a, b] 



Proof: Wc already know that (C(X), || ■ ||oo is a normed space, so it remains to check that it 
is complete (w.r.t. the sup norm). 

If {fn)n is a Cauchy sequence in C{X), then for allx & X, the component sequence {fn{x))n 
is a Cauchy sequence in M. As M is complete, each component sequence converges to some 

real number — call it /(x). This convergence is also uniform: If x £ X, then \ fn{x) — f{x)\ = 
lim^ \fn{x) - fm{x)\ < limsup^ \\fn - /miloo, and hence - /||oo < limsup^ ||/„ - /„||oo. 
Since (/„) yj, is a Cauchy sequence in C(JC), we have luiiYn,n—>oo Wfn - /mIloo 0, and so 
Wfn - /I loo ^ also. 

Since the uniform limit of continuous functions is again continuous, we see that / G C{X), 
and thus that C{X) is complete when equipped with the sup norm. 

H 

Exercise 2.8.12 (a) Show that C[0, 1] is an infinite dimensional space. 
[Hint: Consider the functions fn{x) := x".] 

(b) Show that the closed unit ball -B(0, 1) := {/ € C[0, 1] : ||/||oo < 1} is a closed and bounded set which is 
not compact. 

[Hint: The functions fnix) := a;" form a sequence in B[0, 1] which has no convergent subsequence.] 

□ 



Lemma 2.8.13 (Dini's Theorem^ If fn, f ^ ^(-^) ^'^^ fn i f , then fn^f uniformly. 



Proof: Define Gn := {x e K : fn{x)-f{x) < e}. Each G n is open, Gi C G2 C G^ C . . . and 
|J„G„ = K. Since K is compact, there is no such that Gno = K. Then \fn{x) — f{x)\ < e 
for all n > no and all x E K. 

H 



Definition 2.8.14 (i) A subset T C C{K) is said to be equicontinuous at a point 
xo G if if and only if supjg_;r |/(a;) — f{xo\ — > as a; — ^ xq, i.e. iff 

Ve > 03(5 > OV/ G J^x G K[d{x, xq) < 6 ^ \f{x) - /(xo)| < s] 

(This is just the usual definition of continuity of / at xq, but here we require that 
for a given e > there is a S which "works" for all / G simultaneously.) 

(ii) ^ C(if) is said to be equicontinuous iff it is equicontinuous at every xq G K. 

(iii) C C{K) is said to be uniformly equicontinuous iff 

Ve > 036 > OV/ G J^x, y G K[d{x, y) < s ^ \f{x) - f{y)\ < e] 

(This is just the usual definition of uniform continuity of /, but here we require that 
for a given e > there is a S which "works" for all / G simultaneously.) 



The following lemma extends the result that continuous functions are uniformly continuous 
on compact sets: 



Basic Notions of Topology 



41 



Lemma 2.8.15 // {K,d) is a compact metric space, then any equicontinuous family is 
uniformly equicontinuous. 



Proof: Suppose J- C C{K) is equicontinuous, but not uniformly equicontinuous. Then there 
exist £ > 0,Xn,yn & K and fn & ^ such that for all n G N we have 

d{Xn,yn) <\ yet \fn{Xn) - fn{yn)\ > £ 

Since K is compact, the sequence x„ has a convergent subsequence Xn^. — ^ x, so we may 
w.l.o.g. assume x„ x. Then also y„ — > x. As is equicontinuous at x, we have, for 
sufficiently large n, that 

IfniVn) - fnix)\ < | and \fniXn) - fnix)\ < | 

from which we obtain the contradiction \fn{x) — fn{y)\ < £• 

H 

A related notion, which may help to clarify the above, is the modulus of continuity. This 
is a function m on C{K) x M+ defined by 

m(/, S) := sup{|/(a;) - f{y)\ : x,y e K, d{x, y) < 6} 

To say that / is uniformly continuous is to say that lim^^o ^(/) = 0; and to say that is 
uniformly equicontinuous is to say that lim^j^o supjgjr rn(/, (5) = 0. 

Note that, for fixed 5 > 0, the function m(-, 6) : C[0, 1] — M : / i— m{f, S) is continuous: 
Indeed, using the inequality \a\ — |6| < | \a\ — |6| | < |a — 6|, we see that 

m(/i,5) - m(/2,5) < sup - - 1/2(0;) - /2(y)| 

x,yeK d{x,y)<S 



< sup 

x,y&K d{x,y)<S 
<2||/l-/2||oo 



(/i(^)-/2(x))-(/i(y)-/2(y)) 



from which the result follows. Combined with Dini's lemma, we thus see that if n — 00, then 
m(-, ^) — >■ uniformly. 

2.8.2 Compactness: Arzela— Ascoli Theorem* 

The Arzela- Ascoli Theorem characterizes relative compactness in the space C[0, 1]: 
Theorem 2.8.16 (Arzela- Ascoli) 

Let C C[0, 1]. Then T is relatively compact if and only if 
(i) sup^gjp 1/(0)1 < 00, and 
(a) lim5^osupjg;Fm(/,(5) = 
( if and only if T is uniformly bounded and uniformly equicontinuous.) 



42 



The Banach Space C[a, b] 



Proof: First suppose that is relatively compact, i.e. totally bounded. Then it is obviously 
bounded in C[0, 1], and so 

sup 1/(0)1 < sup I I/I loo < oo 

which yields (i). To see (ii), recall that the modulus of continuity m(/, S) is continuous in /. 
Any continuous function / € C[0, 1] is uniformly continuous, i.e. has m(/, ^) — > as n ^ oo. 
Define F„ : C{T) R : f ^ m{f, ^). Then F„ ^ pointwise. By D ini's lemma, since J- is 
compact, we must have F„ — uniformly, i.e supj^-p \ Fn{f)\ 0, from which (ii) follows. 

For the converse, suppose that (i), (ii) hold for C C[0,1]. We must show that T is 
relatively compact. First, we show that !F is uniformly bounded, i.e. that there is M < oo 
such that ll/lloo < M for all f . By (ii), choose n large enough that supf^-pm{f, ^) < oo. 
For t G [0, 1] we have 

n 

l/WI< 1/(0)1 + 

k=l 

and thus 

M := sup sup \f{t)\ < oo 

0<t<l /GJF 

It follows that is uniformly bounded, as asserted. 

To show that is relatively compact, it suffices to show that it is totally bounded, as 
C[0,1] is complete. Let e > 0. Let y be a finite subset of [— M, M] with the property that 
any element of [— M, M] lies within a distance of < e of some element of Y. Also choose n 
sufficeintly large that m(/, ^) < e for all f €z J-. Let V be the set of all piecewise linear 
continuous functions, which are linear on each interval Ikn '■= [^^) |] with endpoints taking 
values in Y. The set V is clearly finite, as Y is finite, and there are only finitely many intervals 
hn, k = 1, . . . ,n. We claim that 

J^C U B{p,2e) 
pev 

Indeed, if / G :F, then there is is p G such that |/(|) -p^\ < £ for A; = 0, . . . , n. If t G [0, 1], 
there is k such that t G Ikn- Then p{t), being linear on Ikn, is a convex combination of the 
endpoints: p{t) = \p{^) + (1 - A)p(|) for some < A < 1. Now 

\f{t)-p{^)\ < \f{t) - /(|)| + 1/(1) -P(^)| < 2e 

and similarly \f{t) — ;j(^^)| < 2£ as well. Thus 

\f{t)-p{t)\ < X\f{t)-pC-^)\ + (1 - X)\f{t)-p{^)\ < 2s 

This holds for all t G [0, 1], so ||/ — p||oo < 2£ and hence / G B{p, 2e), as required. 

H 

2.8.3 Separability: Stone— Weierstrass Theorem* 

Theorem 2.8.17 (Weierstrass Approximation Theorem) Suppose that / : [a,b] M is 
continuous, and let e > 0. Then there exists a polynomial B{x) such that ||/ — i?||oo < e 



Basic Notions of Topology 



43 



Proof: Wc give a probabilistic proof, due to Bernstein. By scaling and translation, we may 
assume without loss of generality that the interval [a, b] is the unit interval [0,1]. For p G [0, 1] , 
consider independent tosses of a coin which lands H with probability p. Let = 1 if the k^^ 
toss lands H, and let = otherwise. Let Sn := X]fc<„ be the number of H observed in 
n tosses. All of this is modeled on some probability space Then define 



k=0 



k=0 

Now 

\Br,{p)-f{p)\ = \E[f{^)-f{p)]\ 

and the law of large numbers implies /(^) p a.s. This suggests a proof: 

Use compactness of [0,1] and uniform continuity of / to choose K,S > so that 

sup|/(x)|<i^ sup |/(x)-/(y)| < i£ 

X \x-y\<5 

Define also 

Dn := 1/ {^) - m\ Fr,:={uen:\^-p\<S} 

Note that a; G F„ implies < Dn{Lo) < \£ and that Dn{u>) < 2K for all a; G f2. Further 
observe that Chebyshev's inequality yields 

FiF^) < ^Var(^) = ^Var(j;X,) = ^{p-p') < ^ 

k<n 

Thus 

\Bnip)-f{p)\<EDn 

= E[Dn;Fn]+E[Dn;F^] 

Let N £ N be least so that + < e. Thus |-B„(p) - f{p)\ < e for all p e [0, 1] when 
n > N, and hence Bn f uniformly in p. 

H 

Recall that if X is a compact topological space, then C{X) is the set of all continuous 
real-valued functions on X, equipped with the sup-norm || • ||oo- Further recall that 

max{/,5r} = l{f+g+\f-g\) mm{f,g} = -max{-f,-g} \f\ = max{/, 0}+max{-/, 0} 



44 



The Banach Space C[a, b] 



Proposition 2.8.18 Let K be a compact Hausdorff space, and let J- C C{K) be a closed 
set (w.r.t. the sup-norm). Suppose that 

(1) contains all the constant functions; 

(2) is closed under addition and multiplication; 

(3) T is point separating, i.e. if x y G K, then there is f G T such that f{x) / f{y); 

(4) f is closed under absolute values, i.e. f G implies \ f\ G J^. 
ThenJ^ = C{K). 

Proof: Note that by (1), (2), (4) and the remarks preceding the statement of the theorem, 
that f,gE:J^ impHes max{/, g"} and min{/, g"} belong to J^. 

Fix h G C{K), and x e K. We first show that if y e K, then there is hy e such that 

hy{x) = h{x) hy{y) = h{y) 

This is obvious if x = y. lix^ y, choose any f G T such that /(x) ^ f{y) — this is possible 
because is point separating — and define 

Observe that hy := af + b has the required properties, and that (1), (2) imply that hy G T. 

Given e > 0, the sets Uy := {v e K : hy{v) > h{v) — e} form an open cover {Uy : y G K} 
of K, as y £ Uy. Since K is compact, there are yi, ...,?/„ G if such that Uy,^ , ■ ■ ■ , Uy^ cover 
K also. Now define 

gx := max{/ij/^ : k = l,...,n} 

Then gx G J^, with 

gx{x) = h{x) Qxiv) > h{v) — e for siH v E K 

This can be done for every x G K. Now let Vx := {v £ K : gx{v) < h{v) + e}. Then 
{Vx ■ X G K} forms an open cover of K, as x £ Vx. By compactness, there is a finite subcover 
Vxi,...,Vx^. Define 

g := Tain{gxj : j = l,...,m} 
Then g £ J^. It follows easily that 

h{v) - £ < g{v) < h{v) + e for all u G -fC 

and thus that ||(? — /i||oo < 

Hence for every h G C{K) and every e > 0, we can find g £ T such that ||5 — /i||oo < ^■ 
Since T is closed, it follows that h£ T ion every h G C{K). 

H 



Basic Notions of Topology 



45 



Theorem 2.8.19 ( Stone- Wcicrstr ass Theorem) 

Let K be a compact Hausdorff space, and let T C C{K) he such that 

(1) contains all the constant functions; 

(2) is closed under addition and multiplication; 

(3) T is point separating, i.e. if x ^ y £ K, then there is f £ T such that f{x) ^ f{y)- 
Then T is dense in C{K), i.e. every h G C{K) is a uniform limit of members of J^. 

Proof: Let ^ be the closure of !F in C{K) (w.r.t. the sup-norm). Then satisfies (1), (2), 
(3) as well. By Propn. 2.8.18 it suffices to show that T is closed under absolute values. 

Note that (1), (2) imply that if P is a real polynomial and f £ T ^ then P o / g JF (where 
(P o f){x) = P(/(x)) for X e K). Now if / G C C{K), then M := ||/||oo < oo, as K is 
compact. The function A : [— M, M] — ^ M : t i— |t| is continuous, and is therefore a uniform 
limit of polynomials Pn, by Thm. 2.8.17. It follows that |/| = A{f) is a uniform limit of 
P„ o / G jr. Thus I/I G JT whenever / G JT. Thus JT satisfies (l)-(4) of Propn. 2.8.18. 



46 The Banach Space C[a, b] 



Chapter 3 

Motivation for Measure Theory 



3.1 What is "Area"? 

"Area" is a number associated with certain subsets of the Euchdean plane M^, i.e. it is a 
function | • | which assigns to a set C its area \E\. Intuitively, the area function | • | 
should have certain properties: 

1. If £; C M2 is bounded, then < < oo. 

2. I • I is rotation- and translation invariant: If a set F is obtained by rotating and shifting 
a set E, then |F| = \E\. 

3. If is a rectangle, then \E\ = length x breadth. 

4. i • i is additive: If E, F are disjoint bounded subsets of R^, then \E U F\ = \E\ + 
More generally, ii Ei, E2, ■ ■ . is a sequence of mutually disjoint bounded subsets M^, 

then|U,g„|=Er=il-gnl- 

It follows easily how to calculate the area of triangles, and thus that of polygons. But 
what about non-polygonal sets? For example, how do we justify that the area of a circle of 
radius r is 7rr^? Before we do this, do the following exercise. 

Exercise 3.1.1 Use the properties (l)-(4) of the area function to show the following: 

(a) |0|=O. 

(b) If ^ C S then \B - A\ = \B\ - \A\. 

(c) If ^ C S then \A\ < \B\. 

(d) If C ^2 C ^3 C ... and if A := U„^n, then \ A\ =lim„|^„|. 

[Hint: Define Bi := 0, and for n = 1, 2, 3, ... , let Bn+i := An+i — An. Then An = Uj=fe -^fc is a union of 
disjoint sets. Use the fact that the value infinite series is a limit of finite sums.] 

□ 

Exercise 3.1.2 We use the properties of the area function to show that the area of an open circle A of 
radius r is |^| = nr^. 

(a) For n = 1, 2, 3, . . . , let An be the regular open polygon with 2"+^ sides, inscribed in a circle of radius r. 

Thus Ai is a square, Ag an octagon, etc. Note that Ai Q A2 A-s C. . . . . Also note that An = A. 
(This is why we need the sets An and A to be open subsets of K^). 



47 



48 



Shortcomings of the Riemann Integral 



(b) An consists of 2"^^ congruent isosceles triangles, constructed by joining each of the sides of the polygon 
to the centre of the circle. Explain why each such triangle has area sin and conclude that \A„\ = 
2"r2 sin^. 

(c) Conclude that |^| = lim„ nr^ ^ '^"V'" ^ = Trr^. 

□ 

The technique used to compute the area in Exercise 3.1.2 rehes on the set A being approx- 
imated from, the inside by triangles. Not every set can be so approximated, however. Take 
for example the set A := {{p, q) : p,q are rational numbers with < p, g < 1}. It is not clear 
what the area of A should be: On the one hand, the set A is dense inside the unit rectangle, 
so one might guess that |^| = 1. If we approximate A from the outside however, we obtain a 
convincing argument that \A\ =0: 

Exercise 3.1.3 Recall that the set of rational numbers is countable. So we can write the elements of A in 

a list: A = {{pn, q-n) '■ n G N}. Fix e > 0. For n G N, let _R„ be a square centred at with sides 

Let B = Rn, and show that \B\ < |J?„| = e. Also show that A C B so that |A| < e. Since e > was 

arbitrary, we have \A\ < s for all e > 0, i.e. \A\ = 0. 

□ 

Observe that the technique used in Exercise 3.1.3 can be used to prove that every countable 
subset of has zero area. It is therefore necessary that the set of real numbers be uncountable 
for the concept of area to have a useful meaning! 

Later in this course we shall show that it is impossible to assign an area to every bounded 
subset of R^, i.e. there is no function which satisfies each of the properties (l)-(4) of area 
above, and which is defined for every bounded subset of M?. Thus there are subsets of M? 
which have no area. This does not mean that these sets have zero area; it means that there 
is no number which can be called their area, and which is consistent with (l)-(4). 

Remarks 3.1.4 Just so, some subsets of fail to have a volume. This is illustrated by the following 
theorem, called the Banach-Tarski paradox: It is possible to break up a solid ball the size of a pea into finitely 
many pieces, and to rearrange these pieces to form a solid ball the size of the sun (or into pretty much any shape 
of any size that you desire)^. [However, the proof of this theorem depends on an axiom about sets that most 

people find obvious, but that mathematicians of the early twenthicth century doomed highly controversial, 
namely the axiom of choice. This axiom states that if you have a family of non-empty disjoint sets Ai {i £ I), 
then there is a set B which contains exactly one element from each Ai.] 

□ 

3.2 Shortcomings of the Riemann Integral 

We recall briefly how the Riemann integral f{t) dt is deflned: Let / be a real-valued 
function defined and bounded on an interval [a, b]. A partition P of [a, 6] is a a finite ordered 
set {o = io < ^1 < ^2 < • • • < in = b}. The size of such a partition is denote a{P), and 
defined by 

a{P) := max(tfe - 

k 



^No, really, this is a theorem. 



Motivation for Measure Theory 



49 



A tagged partition is a partition P together with a choice G [ifc-i, ifc] for each k = 1, . . . ,n. 
Tagged partitions will be indicated by a *, i.e. if P is a partition, then P* dentes an associated 
tagged partition. 

With each tagged partition, we can associate a Riemann sum 

n n 

S{P*, /) := J2 M) (tk - tk-i) = E ^kt 

k=l k=l 

The Riemann integral f dt should be the limit of the Riemann sums, over all tagged 
partitions P*, as cr(P) — 0. To be precise, we say 

lim S(P*, f) = L exists 

if and only if for every e > there is (5 > such that 

\S{P*, f)-L\<e whenever a{P) < 6 
Then we define ^ 

[ fdt:= lim S(P*,f) 

provided this limit exists, and say that / is Riemann integrable with respect to G on [a,b]. 

With each partition {a = to < *i < • • • < = ^} it is possible to associate three natural 
tagged partitions, namely those having tags equal to the left endpoint, right endpoint and 
midpoint of each interval. This yields: 

• The lefthand Riemann sum J^k f{ik-i)^kt', 

• The righthand Riemann sum J2k f{tk)^k't', 

• The symmetric Riemann sum j^ *fc-i+*fc 

If / is Riemann integrable over [a, 5], then each of these sums must converge as (j(P) 0, 
and all to the same limit. 

Remarks 3.2.1 A slightly different definition uses Darboux sums rather than Riemann sums. Given a 
real-valued functions / defined and bounded on an interval [a, b], and a partition P = {a = to < ti < • • • < 
tn = 6}, let the upper and lower Darboux sums bo defined by 

n 

U{P,f) ■.= J2sMf{t)-te[tk-i,tk]}-itk-tk-i) 

n 

L{P,f) :=^inf{/(t) [t^-i, tfc]} • (tfe - tfe-i) 

fc=i 

If / is continuous on [a, b], it attains its supremum and infimum on each subinterval, i.e. we can choose 

e [tk-iM such that 

f{ttn = sup{/(t) : t € [t^-uU]} fitfn = mf{/(t) : t G [tk-uU]} 

If p*max p*min tagged partitions given by a = to <■■■< in = b and the tags t™*^,*™'" respectively, 

then it is easy to see that 

U{P, /) = S{P* /) L(P, /) = S{P* /) 

i.e. the Darboux sums give the most extreme values of the Riemann sums for any given partition. However, 
the Darboux sums may differ from Riemann sums if / is not continuous. 

The Riemann sums may be defined even when / is Banach space-valued, however, whereas the Darboux 
sums, being dependent on sup's and inf's, make sense for real-valued functions only. 



50 



Shortcomings of the Riemann Integral 



□ 

Prom an earlier course in analysis, we know that the Riemann integral J^^ / dt exists 
when / is continuous (or even picccwise continuous) on [a,b]. When the function is too 
discontinuous, we run into trouble, however: 



Example 3.2.2 ConsidsT the Dirichlet function 

hit) := 



1 if t e I 

else 



where Q is the set of rational numbers. If P — {a — to < ti < • • • < t„ = 6} is any partition of [a,b], no 
matter how fine, we can always find tags tl.,t'i^ £ [tk-i,tk] so that tl is rational, and tj. is irrational. Thus 
lQ{tl) = 1,/Q(t'fc) = 0. It follows that 

5(P*, /) = ^ 1 . (tfc - tk-i) = 1-0 = 1 S{P', f)J20-{tk- tk-i) = 

fc fc 

and thus S{P*,f),S{P',f) cannot be made to lie arbitrarily close to each other, no matter how fine the 
partition P. Thus limo.(p)^o 'S'(^', /) does not exist. 

□ 

When the Riemann integral is first encountered, it is taught as "the area under a curve" : 
If / > is continuous, then f dt is the area under the curve described by /, between t = a 
and t = b. For A C M, define the indicator function of A by 



lA{t) :-- 



1 iiteA 
else 



Consider now the Iq, where Q is the set of rational numbers. This function is very discontin- 
uous. If we try to compute this "curve" over the interval [0, 1] using the Riemann integral, 
we run into trouble: The Riemann integral Iq dt does not exist. 

We can make a convincing argument that the area under the curve over the interval [0, 1] 
should be zero, as follows: Use the fact that Q is countable to enumerate the rational numbers 
in [0, 1], i.e. write [0, 1] n Q = {g„ : n G N}. For any e > 0, define 

Bn ■= [qn - 2^,qn + forn € N, / = I\J^Bn 

The area under the curve of / is made up of (possibly overlapping) rectangles of height 1 
centered at the rational numbers. Thus the area under / over [0, 1] is < Y2n !• (length of Bn) = 

^ — ^- It is also clear that < /q < /, and thus that the area under Iq is less than the 
are under /, i.e. that the area under Iq is < e. Since this is true for any £ > 0, we conclude 
that the area under /q is 0. 

Thus we have the following: 

1 

Iq dt is undefined, but it should be zero 







The Riemann integral is simply not powerful enough to handle functions like Iq. 

You may counter that a function such as Iq is pathological, and unlikely to be encountered 
in practice. It is true that we chose it here simply to make a point. However, the following 
example should cause you to feel uneasy about the assertion that Iq is "pathological" : 



Motivation for Measure Theory 



51 



Example 3.2.3 Consider the function g{t) defined by 

g{t) = lim lim cos(m!7rt)^" 

m — ►oo n— »oo 

If i is a rational number, i.e. ^ = | where p £ Z, q G N, then m! t £ Z for all m > q. Consequently, 
cos(m!7rt) = ±1 for all m> q, and thus cos(m!7rt)^" = 1 for all n and all m> q. It follows that g{t) = 1 when 
t is rational. 

On the other hand, if t is irrational, then < | cos(m!7rt)| < 1 for all m, and so < cos(m!7rt)^" < 1 for all 
m. Now if < a; < 1, then x" — > 0. It follows that lim„^oo cos(m!7rt)^" = for all m, and thus that g{t) = 
when t is irrational. Hence 

7q = lim lim cos(m!7rt)^" 

m— »oo n— *oo 

The "pathological" function /q therefore appears as a limit of perfectly ordinary functions. 

□ 

This brings us to the next problem: The Riemann integral does not handle limits well. 

There are many other examples of functions /„ f, where are perfectly good Riemann 
integrable functions, and where f dt should be lim„ dt, but where / fails to be 

Riemann integrable. A better integral is called for. 

3.3 Motivation from Probability Theory 

A model for an experiment involving randomness takes the form (fi, J^, P). Intuitively, 
is the set of all possible outcomes of the experiment, and is called the sample space. T is 
the set of all events, i.e. "permissible" combinations of outcomes. (We shall see that not 

all combinations of events arc permissible - it is simply impossible to create a consistent 
mathematical theory of probability in which every set of outcomes is permissible.) P is a map 

— > [0, 1] which assigns to each (permissible) event its probability. 

Example 3.3.1 A die is rolled once. The possible outcomes are an integer between one and six. Thus the 
sample space can be taken to be O = {1, 2, ... , 6}. We may be interested in the following events: 

(a) The outcome is the number 1; 

(b) The outcome is an even number; 

(c) The outcome is an odd number which is strictly greater than 1; 

Each of these events can be described by a subset of the sample space. Thus if A,B,C are the subsets 
corresponding to the events (a), (b), (c), then 

A = {1} 

B = {2,4,6} 
C = {3,5} 

The probabilities of these events, by elementary reasoning, are F{A) = i , P(B) = i and P(C) = | , provided 
that the die is fair. Every subset of f2 is a "permissible" event, and thus = 'P(fi)- 

□ 

Mathematically, an event is a set, i.e. events are just subsets of the sample space. The 
outcome of any random experiment must be some element uj of the sample space fl. Now fl is 
itself a subset of CI and thus corresponds to some event. We call it the certain event, since we 
are certain that a; G fi. We must always have P(J7) = 1. The empty set is also a subset of 
O and thus corresponds to some event. We call it the impossible event, since it is impossible 
that an outcome lo is in 0. We will always have P(0) = 0. 



52 



Structure of Events 



Note that the sample space corresponding to a random experiment need not be unique. 
Consider, for example, the random experiment of rolling two dice. Then we can choose the 
sample space to be the 36 element set Jli = {(i,j) : i,j positive integers between 1 and 6}. 
The probabilities for each outcome are then the same: P(a;) = ^ for each a; G f^i. This is a 
so-called uniform distribution. On the other hand, we can choose the sample space to be the 
11-element set = {2,3,4, . . . , 12} corresponding to the total of the two dice. In this case, 
the probability distribution is non-uniform: P({7}) = | whereas IP({2}) = ^. Choosing the 
sample space and the corresponding probability distribution for a particular situation is part 
of the art of probabilistic modelling. 

Example 3.3.2 A coin is flipped until the first head turns up. This may happen on the first toss or the 
second, or. ..or never. Thus the sample space is n = {u)i,lo2, ■ ■ ■ ,uJoo}, where the outcome iOn denotes the 
event that the first head turns up on the n**" toss, and Woo denotes the event of never flipping heads. It is clear 
from elementary probability that P({a;„}) = ^ (provided that the coin is fair). We may now consider various 

composite events, such as: 

(a) Let A be the event that the flrst head appears on either the third or the fourth toss. Then A = {ijJs} U 
{uj4} = {ws, W4}. Clearly P(A) = ^ + ^. 

(b) Let B be the event that the first head appears after an even number of tosses. Thus B = \J„^^{'jJ2n} and 

oo 

P(B) = ^ jijr = I . Did you think that the probability that the first head appears after an even number 

n=l 

of tosses is I? If so, note that the probability that the first head appears on the first toss is |, and the 
probability that the first head appears after an odd number of tosses is therefore greater than | . 

(c) Let C be the event that both events A and B occur. Clearly C = {oja} = Ar\B. 

(d) Let D be the event that a head does occur after a finite number of tosses. Thus D is the complement of 

oo 

the event that heads never occurs. Thus D = 9.- {woo} = {wi, W2, . . . }. Hence W{D) = ^ ^ = 1. This 

n=l 

can also be seen from the fact that P({a;oo}) = 0. 

□ 

3.4 Structure of Events 

In order for our mathematical theory of probability to bear some resemblance to the real 
world, it is clear that we should be able to combine events in the following ways: 

• If ^ is an event, then the possibility of A not occurring should also be an event. Now 
if the outcome of a random experiment is a; G fi, then the event A occurs if and only if 

u ^ A (remember that we consider an event to be a subset of the sample space) . Thus 
the event that A docs not occur corresponds to to ^ A, i.e to the set A'^ = il. — A. We 
want the probabilities of these events to be related by P(^'^) = 1 — P(yl). 

• If ^, i3 are events, then the possibility of both A and B occurring should also be an 
event. Now if the outcome of a random experiment is w G O, then both A and B occur 
if and only \i uj ^ A and u e B, i.e. if and only if u e An B. Thus the event of both A 
and B occurring corresponds the set An B. 

• In the same way, if A and B are events, then the possibility of at least one of ^ or 5 
occurring should be an event as well. This corresponds to the set Au B. We say that 
events are disjoint or mutually exclusive if they cannot occur simultaneously. Thus if 
A,B are disjoint, the cu & A implies u) ^ B. Clearly, therefore, A and B are disjoint 



Motivation for Measure Theory 



53 



if and only if A f) B = (i.e. the event that both A and B occur is impossible). 
For disjoint events A and B, we want P(A U B) = ¥'{A) + P(i?). This is because 
yjB) = = ^a+Nb ^ ]p(-^^ _^ ]p(-^^ (where Na is the number of elements in 

the set A, etc.) 

• In fact we demand more: Given a countable list of events Ai, ^2, ^3, • • • , the possibilities 

of either all of these events occurring, or of at least one of these events ocurring, should 

00 00 

also be events. They correspond to the sets p| .4„ and (J An respectively. If the 

n=l n=l 

events A^ are mutually disjoint, i.e. if An PI A^ = whenever n ^ m, then we want 

00 00 

n u ^n) = E mn) 

n=l n=l 

The concept of probability has rather a lot in common with that of area: 

• "Probability" is measured by non-negative number F{A) assigned to a subset A of 

the sample space J7. 

"Area" is measured by non-negative number |^| assigned to a subset of M^. 

• P(0) = 0; 

|0| = O. 

• If An are disjoint events, then P(|J„^n) = J2n^i^)'i 
If An are disjoint sets, then | IJn^nl = J2n \^ri\- 

When we isolate and study the common features of probability and area, we get the 
subject of measure theory. We shall show that we can develop a theory which allows us to 
form integrals J f d/iof functions / with respect to measures n (rather than variables) . It will 
turn out that the integral with respect to Lebesgue measure (yet to be defined) is precisely the 
more powerful generalization of the Riemann integral that we seek. It will also transpire that 
the integral with respect to a probability measure precisely captures the notion of probabilistic 
expectation. 

Armed with the intuition and motivation provided by the above examples, we now proceed 
with the formal theory. 



54 Structure of Events 



Chapter 4 

Measure Spaces 



4.1 Events and cr— algebras 

To model a random experiment, we need to define three objects: 

• A sample space J7, representing the possible outcomes of the experiment. The outcomes 
Lo E fl are called sample points. 

• A family T of events. 

An event is a (permissible/relevant) subset of ft. If A is an event, we say that A occurs 

if the outcome u) is an element of A. 

We shall require JT to be a a-algebra (which we define below). 

• A probability measure P which assigns to each event a probability. 
Thus P : J"-. [0,1]. 

The triple (O, JT, P) will be called a probability space (subject to certain conditions on and 
P). 

Recall that an event E & is said to have occurred if the outcome lo of the random 
experiment belongs to E. Intuitively, we think of as a set of events E for which we can 
decide whether or not E occurred at the termination of the experiment. Note: whether or 
not. 

This intuition imposes the following constraints on J^: 

(a) n (E J=' djid^ e J^. 

Indeed, every outcome lo belongs to fi, and thus the event always occurs — it's the 
certain event. 

Similarly, no outcome co belongs to 0, and thus the event never occurs — it's the 
impossible event. 

(b) If £^ G then E'^ G J^, i.e. is closed under complementation. 

For if we can decide whether or not E occurred, then we can also decide whether or not 
E'^ occurred: For suppose that the outcome of the experiment is lo. If E occurred, then 

LO € E, so uj ^ E'^, hence E'^ did not occur. 
Similarly, if E did not occur, then E'^ did occur. 



55 



56 



Events and a-algebras 



(c) If Ei,E2 € then Eif] E2 £ i.e. is closed under intersection. 

For if we can decide whether or not Ei occurred, and also whether or not E2 occurred, 
then we can decide whether or not Ei n E2 occurred: Ei n E2 occurred iS u e Ei D E2 
iS u> E El and u> E E2 iS both Ei and E2 occurred. 

Thus if wc can decide whether or not Ei, E2 occurred, we can also decide whether or not 
El n E2 occurred. 

(d) Similarly, T is closed under union: The event Ei U E2 occurs iff either Ei occurred, or 
E2 occurred (or both). 

(e) We can generalize (c) and (d) somewhat: If Ei,E2, E^, . . . , En, . . . , is a countable sequence 
of members of J^, then also En G and En G i.e. is closed under countable 
intersections and -unions. 

For f]^ En occurred iff each of the En occurred, and U„ En occurred iff at least one of 
the En occurred. Thus if wc can decide whether or not each E^ occurred, we can also 
decide whether or not f]^ En and En occurred. 

This leads to the following definitions: 

Definition 4.1.1 Let O be a set. A collection A of subsets of CI is called an algebra (or 
field) on Q if 

(i) G ^; 

(ii) AeA^A^eA; 

(iii) A,B e A^ Au B e A. 

A collection T of subsets of CI is called a a-algebra (or a-field) if it satisfies (i), (ii) and 
(iii)^ liAn^J^ (for n G N), then IJn e J^. 

Exercise 4.1.2 (i) An algebra is closed under (finite) intersections. 

(ii) A cr-algebra is closed under countable intersections. 

(iii) A cr-algebra is also an algebra. 

(iv) If n is a finite set, then any algebra on O is also a cr-algebra. 

(v) If A is an algebra and A.,B £ A, then A — B and AAB belong to A. 

item If SI is a set, then J^o = {0,f2} is the smallest a-algebra on fi, and J^oo = "Pi^) is the biggest 
cr-algebra on fi. 

(vi) If {J^i : i e 7} is a family of cr-algebras on fl, then := Cli^^^i is also a cr-algebra on Q. 

□ 

Events are organized in cr-algebras. The set-theoretic operations |J, f], correspond 
to logical combinations or, and, not of events. 

Frequently, the events of interest form a collection C which is not a a-algebra. Suppose 
that C is a collection of events which can be decided, i.e. if E e C, then we can decide whether 
or not E occurred. We can then also decide whether or not E'^ occurred, but E'^ may not be 
an element of C. The bigger set J- of all events that can be decided, given that we can decide 
all the events in C, is a cr-algebra containing C. 



Measure Spaces 



57 



Definition and Proposition 4.1.3 Let C be a family of subsets of Q. There exists a 
unique smallest a -algebra T which contains C (i.e. C , and if Q is any a-algebra such 
that C QQ, then T C Q also). 

T is called the a-algebra generated by C, and denoted by 

T = a{C) 

Proof: Let F = {g : g a a-algebra with C C g}, and let = f|F. Then JT e F. (Why?) 
Moreover, if ^ is a cr-algebra which contains C, then ^ G F, and so C ^. (Why?) 

H 

We repeat the following important intuition 

(7(C) consists of all those events F for which we can decide whether or not F has occurred, 
given that we know exactly which of the E e C have occurred. 



Definition 4.1.4 If the sample space is a topological space, we define the Borel algebra 

B{Q,) = cr(open sets of CI) 

In particular, B{R) is the smallest cr-algebra on M which contains all the open subsets of 
E. 

i3(M) is one of the most important cr-algebras. 

Exercise 4.1.5 Prove that the following sets belong to B(R): 

(i) All closed intervals [a;, y], where a; < j/ € R. 

(ii) The half-open intervals {x,y] and [x,y), where x < y €M. 

(iii) Every singleton {x}, where a; G R. 

(iv) Every countable subset of K. 

(v) The sets Q of rational numbers and Irr of irrational numbers. 

□ 

Exercise 4.1.6 Define 

C := collection of all intervals of the form (— oo,a:], where a; € K 
Show that o-(C) =S(K). 

(You may use the fact that every open subset of R can be represented as a union of countably many open 
intervals.) 

□ 

4.2 Measures 

The notion of measure generalizes the concepts of length, area, volume, mass, and probability. 



58 



Measures 



Definition 4.2.1 Let he a a algebra on a set lo. A function : ^ — ^ M is called a 
(countably additive, non-negative) measure if and only if 

(i) /X is non-negative: < fiA < oo for each A E J^. 

(ii) At0 = 0. 

(iii) n is countably additive (or a-additive) : If yli, • ' ' ^ is a- countable sequence of 
pairwise disjoint sets, then 

/"(U An) = 

n n 

If fiQ = 1 , then fi is called a probability measure. 



If ^ is a cr-algebra on a set ft, then the pair {ft, is called a measurable space. The 
elements of J- are called measurable sets, or events in the probabilistic framework. If, in 
addition, /i is a measure on T, the triple {ft, T , [i) is called a m,easure space. The symbols 
P,Q are used for probability measures, and {ft,J^,F) will always denote a probability space. 

Example 4.2.2 Important: Lebesgue Measure: We shall prove later there is a unique measure A on 
(K, B(R)), which assigns every interval its length, i.e. 

A(a, b) — \{a, b] — X[a, b] = b — a 

This measure is called Lebesgue measure, which provided them original impetus for the development of the 
subject of measure theory. 

Lebesgue measure is also important in probability theory. Consider, for example, the experiment of drawing 
a uniformly distributed random number from the unit interval [0,1]. The probability of drawing a number 

> I is P([|,l]) — |. The probability of drawing a number between | and | is f([j, \]) = 3^. Similarly, 
the probability of drawing a number between a and b (where < a < b < 1) is ¥{[a,b]) — b — a. Thus the 
appropriate measure P is just Lebesgue measure, restricted to [0, 1]. 

There are higher dimensional analogues of Lebesgue measure: There is a measure, also denoted A and 
called Lebesgue measure, on (K",S(R")), which assigns to every n-dimensional rectangle its volume. 

□ 

Example 4.2.3 Suppose that F : R ^ R is an increasing right-continuous function, i.e. 

F{s)<F{t) when s<t and F{t) =lim F{s) 

sit 

We shall prove later that there is a unique measure fiF on (R,jB(R)) with the property that 

//f(o, b] = F{b) - F{a) for all a < 6 £ R 

Hf is called the Lebesgue-Stieltjes measure associated with F. Note that if F{t) := t, then //f = A is Lebesgue 
measure. 



□ 



Exercise 4.2.4 Let {X,J^) be a measurable space, and let xo € X. Define 6x0 : — » R by 

10 It Xo ^ F 

Show that Sx„ is a measure on {X,T). 

is called the Dirac measure, or point mass, at xq. 



□ 



Measure Spaces 



59 



Note that, for general measures, we allow +00 as a value. For example, the length of the 
real line is +00, so A(M) = +00, were A is Lebesgue measure on (M, ;S(M)). However, we often 
need to get a "handle" on infinity: 

Definition 4.2.5 A measure on a measurable space (Jl, JT) is called 

(a) finite, if fxft < 00; 

(b) a -finite, if Q is the countable union of sets of finite measure, i.e. if there is a sequence 
Ai,A2,... of measurable sets such that each < 00, and such that CI = An- 



Thus Lebesgue measure is a-finite, but not finite (Why?). 
Proposition 4.2.6 (Additivity Properties^ 

Suppose that {ft,J^, ji) is a measure space, and let A, B, A\,A2, ■ ■ ■ E J^. 

(a) IfACB, then jiA < fiB. 

(b) If AC B and fiA < 00, then - A) = ij.B - ^xA. 

(c) n{A UB) = iiA + jiB- ii{A n B) 

(d) MUn-4n)<EnM„. 

Exercise 4.2.7 (a) Prove the preceding proposition. 

(b) Suppose that (fi, J^, P) is a probability space, and that A,A\,A2,--- € J^. 

(i) Show that P(A=) = 1 - P^. 

(ii) Show that if PA„ = 1 for n £ N, then P(n„ An) = 1 also. 

□ 

Next, we introduce some useful terminology: 

Definition 4.2.8 Let {Cl,J^,fi) be a measure space, and let AC CI. 

(a) We say that A is fi-nuU if there exists B ^ T such that AC B and ^B = 0. 

(b) We shall say that a statement ip holds ^-almost everywhere (or ji-almost surely in the 
probabilistic framework), if the set of a; G where </? fails to hold is //-null. 



We abbreviate /x-almost everywhere and /i-almost surely by //-a.e. and //-a.s. respec- 
tively. 

Remarks 4.2.9 Note that in (a) above, the set A might not belong to T so ^lA might be undefined. 
However, jj,A "ought" to be zero. Later, this insight will allow us to extend measures to u-algebras larger than 
the ones we start off with. 

As an example of (b), consider the reals with Lebesgue measure: (R, B(R), A). Every point is A-null, since 
\{x} = \[x, x] — X — X. Hence the set Q of rational numbers is A-null: The set Q is countable, and has an 
enumeration Q = {g„ : n € N}. By countable additivity. 



71 



60 



Continuity Properties of Measures 



R are defined by 

/ = g = lQ 
/, g are equal A-almost everywhere 

□ 

The following exercise is often useful: 
Exercise 4.2.10 Suppose that (f2,^, /i) is a measure space, and that A € J^. Define 

J^nA = {FnA:F€J^} 

(this is an abuse of notation), and let fj,A = n A. Then {A,J^ PI A, ha) is a measure space also — the 
restriction of {Cl,J^,fj,) to A. 

□ 

4.3 Continuity Properties of Measures 
4.3.1 Limit Operations on Sets 
Definition 4.3.1 Let {An)nefi be a sequence of sets. 

• We say that (^n)n is increasing if C C ^3 C We say that An ^ A H 

{An)n is increasing, and |J„ = A. 

• Similarly, we say that {An)n is decreasing if ^1 2 ^2 2 ^3 2 • ■ • • We say that 
A„ J, ^ if {An)n is decreasing, and fin = A. 

• We define the limit superior of the sequence (^n)n by 

00 00 

lim sup ^„ = Pi [J Ak 

n , , 

n=l k=n 

• We define the limit inferior of the sequence {An)n by 

00 00 

Hni^inf A^ = [J f] Ak 

n=l k=n 

interpretations of the above limit operations: 
00 

yn[xe\J Ak] 

k=n 

Vn 3fe > n [x G Ak] 

X belongs to infinitely many of the sets Ak 
00 

3n [x e Pi Ak] 

k=n 

3n\fk>n [x G Ak] 

X belongs to all the Ak from some n onwards 



If the functions /, 5 : R — » 
Then 



Also note the following simple 
X G lim sup An 

n 

Similarly, 



X G lim inf An <^ 
n 



Measure Spaces 



61 



In particular, x G lim inf „ iff x belongs to all but finitely many of the A, 

Thus wo also write: 



{An,i.o.) = lim sup A„ 

n 


(-4n,ev.) 


= lim inf A^ 

n 


where i.o. means infinitely often, and ev. 


means 


eventually. 



Thus X G {An, i.o.) iff x belongs to infinitely many of the sets An, etc. 



Proposition 4.3.2 (a) liminf„ A„ C limsup„^„ 

(b) (limsup^^n)'^ = liminfjj (liminf„ = limsup„^^ 

Exercise 4.3.3 Prove Proposition 4.3.2. 

□ 

4.3.2 Limits of Sets and Measures 

Proposition 4.3.4 (Continuity properties) 

Suppose that ft, T , /x) is a measure space, and let Ai, A2, ■ ■ ■ E J^. 

(a) If An t A, then /uAn t fJ-A. 

(b) If An i A, and if fiAk < 00 for some k, then fiAn I jJiA. 

(c) If each fxA^ < 00, and An — >■ A, then jj,An — > fJ-A. 

Exercise 4.3.5 (a) Prove the preceding proposition. 

Also show that (b) may fail if we drop the assumption that at least one of the ixAk is finite. 

(b) Suppose that fi is finitely additive on the measurable space (fi,^). Show that if 

fiAn — > whenever An I 9 

then fj, is countably additive. 

□ 



We end this section with some important results: 



Proposition 4.3.6 (a) FATOU'S LEMMA: 


If jj, is a measure on {fl,J^), and if 


Ai,A2, • • • ^ T , then 




iu(lim inf An) < lim [lAn 
n n 


when lim fj,An exists. 

n 


(b) RE VERSE FA TO U LEMMA : If /j, is a finit 


e measure on {yt,J^), and ifAi,A2, ■ ■ ■ ^ T , 


then 




/x(lim sup An) > lim fiAn 

n " 


when lim fiAn exists. 

n 



62 



Lebesgue Measure from Coin Tossing 



Proof: (a) Let B„ = r\m>n^rn- Then liminf„A„ = U„nm>n^n* = [Jn^n, so Bn T 
liminfn^n- By Propn. 4.3.4(a), we see that /iBn T M(l™i^fn^n)- Also, clearly Bn C An, 
and so /U^^ > At-Bn- It follows that 

liminf > liminf /X-B„ = |u(liminf An) 
n n n 

(b) is left as an exercise. 

H 

Exercise 4.3.7 Wc prove the reverse Fatou lemma: 

(a) Let Bn = Um>n ^"1- Explain why i?„ J. limsup„ A„. Conclude that /iS„ J. /i(limsup„ 

(b) Explain why jiBn > sup /xAm- 

(c) Conclude that /tt(limsup„ A„) > lim sup fiAm = limsup„ fiAn- 

n>m 

(d) Where did we need the fact that n is a finite measure? 

□ 

Example 4.3.8 * Recall that the Cantor set is a subset of [0, 1] which is constructed as follows: Let 
Co = [0, 1]. It is a single interval of length 1, so A(Co) = 1. Now let Ci be Co with its middle third removed, 
i.e. Ci = [0, |] U [|, 1]. Thus Ci consists of two disjoint intervals, each of length |. Hence A(Ci) = |. Now 
remove the middle thirds of these two intervals to form C2, i.e. C2 = [0, |] U [|, |] U [|, |] U [|, 1]. Then C2 
is a disjoint union of 4 intervals, each of length |, so A(C2) = | = (|)^- Continue in this way, removing the 
middle thirds of each of the intervals comprising C„ to form C„+i. It follows that Cn consists of 2" intervals, 

oc 

each of length (|)", and thus that A(C„) = (§)". Finally, let C = f] C„. C is the Cantor set. 

71 = 

The first thing to note is that C is a Borel set (why?) 

The second thing to note is that C is not empty; in fact, C is uncountable. Here is one way to see 

°° a* 

this: Every real number a £ [0,1] can be written as an infinite sum Yl w^j where Oi = 0,1 or 2. Thus 

i=l 3' 

the ternary expansion (as opposed to decimal expansion) of a is 0.aia2a3 For example, | = 0.1000..., 

1 = 1 + 1 = 0.1200 . . . , etc. A little thought will reveal that the Cantor set is formed by removing all numbers 
which have a 1 occurring in their ternary expansion. Thus Ci is formed by removing all numbers which have 
a 1 in the first "decimal" place, C2 is formed by removing all numbers in Ci which have a 1 in the second 
"decimal" place, and so on. Thus the Cantor set is just the set of all numbers a in [0, 1] which can be written 

as a sum V' where Oj = or 2. Clearly there are uncountably many such. 

i=i 3' 

Suppose now that once again wc perform the random experiment of choosing a number from the interval 
[0,1], assuming as before that each number is equally likely to be chosen. The Lebesgue measure A is the 
probability measure which models this situation, and we've stated (but not yet proved) that this measure can 
be defined on all Borel subsets of [0, 1]. We now ask: What is the probability that the number chosen belongs 
to C, i.e. what is A(C)? Since C C Cn for each n G N, and since A(C„) = (§)", we must have A(C) = 0. 
Thus the probability that the rmmber chosen belongs to C is zero. Thus C, though a "large" set from the 
cardinality point of view, is a small set from the measure point of view. 

□ 

4.4 Lebesgue Measure from Coin Tossing 

Consider again the random experiment of tossing a coin infinitely many times. We want to 
find an appropriate probability space for this experiment. It is clear that, if the coin is fair, 
each outcome is equally likely, and thus that the probability of a given outcome must be zero 



Measure Spaces 



63 



(because there are infinitely many outcomes). We can say this before we have even decided 
upon an appropriate sample space. This we tackle now: 

Letting 1 stand for "Heads" and for "Tails" , we take the sample space to be 

n = {0, if 

i.e. 0, is the set of N-indexed sequences of O's and I's. We take a slightly different view, 
however. Every sequence of O's and I's can be regarded as the dyadic or binary expansion of 
a real number. For example, the sequence 

1 1 1 1 ... 

can be thought of as the binary number 

0.11010001.... = 1 4+1 4 +o4+i 4 +o4+o4+o4+i 4 +••• 

binary 

= 0.816 . . ■ 

decimal 

We thus have a correspondence between sequences (a„ : n G N) of O's and I's, and real 
numbers between and 1: 

oo 

(«n : n G N) ^ ^ ^ 

n=l 

Clearly (0,0,0,...) i— 0, (1,1,1,...) i— 1, (1,0,0,0,...) i— 1, etc. In this way, every 
sequence of O's and I's provides us with a unique number between and 1. The only problem 
is that this correspondence is not one-to-one: 

(1,0,0,0,...)^^ (0,1,1,1,...)^^ 

However, we can deal with this in a clever way. Call a dyadic expansion terminating if it 
eventually ends in all O's (tails), i.e. if there are only finitely many I's. Let T = {a; G : 
uj is terminating}. It is clear that every terminating expansion has associated with it a non- 
terminating expansion that corresponds to the same number: If a; = 1 is the last 1 in the 
terminating sequence (a„ : n G N) 

0.010203 . . . O;_iO/000 . . . = 0.010203 . . . O;_i0111 . . . 

V ' ^ V ' 

binary binary 

Moreover, a little thought shows that if we chuck out the terminating sequences from O, then 
the correspondence above is a bijection between (l — T and (0, 1]. How many such terminating 
dyadic expansions are there? It is not hard to see that T is countable. Moreover, since each 
element of 17 has probability 0, the probability of the event T is also (being a countable 
union of events of probability 0). It is therefore practically certain that the event T won't 
occur. The terminating dyadic expansions are therefore, in a sense, redundant. Nothing is 
lost by chucking them out (except a set of measure 0). We may therefore take the sample 
space to be the set 

n = (0,1] 



64 



Lebesgue Measure from Coin Tossing 



A natural algebra for this experiment is 



^ = all events that can be decided after only finitely many tosses 



Here are some examples of elements of J^: 

(a) A = the first toss is "Heads" . These are the dyadic numbers with a 1 in the first place, 
i.e. all the numbers from 0.1000 • • • = ^ to 0.1111 • ■ • = 1, but not including 0.1000 . . . , 
because it is terminating. Thus A= (| , 1] 

(b) B = the third toss is "Tails" . This is the set of all dyadic numbers with a in the third 
place. These are the numbers in intervals 



Hence B = (0,0.125] U (0.25,0.375] U (0.5,0.625] U (0.75,0.875]. 
(c) C = There are 2 "Heads" and 1 "Tail" in the first 3 tosses. A little thought shows that 

C = (0.375,0.5] U (0.625,0.75] U (0.75,0.875] 

Reasoning along these lines, it is clear that J- is the fi-algebra generated by all intervals of 
the form (^, ^fe^], where n G N and k < 2". It is therefore not hard to see that = I3{0, 1] 
(since every real number can be approximated arbitrarily closely by a dyadic rational, i.e. a 
number of the form ^ . 

Having identified the "right" o"-algebra, we turn to the probability measure appropriate 
for this experiment. 

(a) F{A) = probability that first toss is "Heads". Assuming a fair coin, this is clearly ^. Now 
the event A corresponds to the interval (5,1], and A(2, 1] = 5 (where A is the Lebesgue 
measure introduced in Example 4.2.2. Thus P(^) = A(^). 

(b) F{B) = probability that third toss lands "Tails" . This is clearly also ^ , as the third toss 
is just as likely to land "Heads" as it is "Tails". Now X{B) = A(0, 0.125] +A(0. 25, 0.375] + 
A(0.5, 0.625] + A(0.75, 0.875] = 4 x | = ^, and thus F{B) = X{B) in this case also. 

(c) P(C) = probability that there are 2 heads and 1 tail in the first 3 tosses. This probability 
is clearly (2)2"^ = |. In this case we therefore also have P(C) = A(C). 

It therefore becomes apparent that the "right" probability space for the random experiment 
of tossing a coin infinitely many times is just the same as the random experiment of picking 
a number from (0, 1] (Example 1.3.4). 

All of this leads us to formulate the following principle: 



binary 



decimal 



(0.000000 
(0.010000 
(0.100000 
(0.110000 



,0.000111... 
,0.010111... 
,0.100111... 
,0.110111... 



(0,0.125] 
(0.25,0.375] 
(0.5,0.625] 
(0.75,0.875] 



Measure Spaces 



65 



Borel's Principle: 

Consider the random experiment of tossing a (fair) coin infinitely many times, and let E 
be an event. Interpret as a subset of (0, 1]. Then F{E) = X{E), i.e. the probability 
that the event E occurs is just the Lebesgue measure of the associated subset of the unit 
interval. 

Exercise 4.4.1 This exercise is meant to get you thinking. Don't worry too much about the details. 

(a) Consider a random experiment in which a coin is tossed 3 times. Describe a suitable sample space and 

(T-algebra for this experiment. How many elements does this a algebra have? 

(b) Consider a random experiment in which a coin is tossed infinitely many times. From lecture notes, you 
know that we can take the sample space to be the interval (0, 1]. 

(i) What is the smallest cr-algebra on (0, 1] which contains the information about the outcomes of the 
first 3 coin tosses? How many elements does it have? 

(ii) Compare your answer in (i) to that in (a). 

(iii) What is the smallest a-algebra on (0, 1] which contains information about the outcomes of all the 
tosses? [To know the outcome of all the tosses, you have to know the outcome of the first n tosses 
for every n € N.] 

□ 

Exercise 4.4.2 Consider the experiment consisting of an infinite sequence of coin flips. For each of the 
following, find the Borel set that corresponds to the event, and calculate its probability. 

(a) The first head comes irmnediatcly after an even number of tails. 

(b) At six fiips, but no earlier, the number of heads equals the number of tails. 

(c) The sequence HHT occurs before the sequence THT. 

□ 

Exercise 4.4.3 A gambler starts with an initial stake of 2 Rands. She bets at even odds on a coin toss, 
i.e. she wins one rand if the coin lands H, and loses one rand if it lands T. We are going to investigate the 
event that she is ruined i.e. loses all her stake. 

If she keeps playing until she is ruined ,this may involve a potentially infinite number of coin tosses. From 
lectures, we know that this situation may be modelled by the probability space ((0, 1],B{0, 1], A), where A is 
the Lebesgue measure on (0, 1]. The gambler's sequence of play is described by an outcome u) = 0.uJiUJ2i^3 ■ ■ ■ 
which is a sequence of O's and I's representing a real number in binary. 

(a) What is the probability that she is ruined on the 1°' toss? 

(b) (i) What is the probability that she is ruined on the 2"'' toss? 

(ii) Which Borel set corresponds to her being ruined on the 2"*^ toss? 

(iii) What is the Lebesgue measure of this set ? 

(c) What is the probability that she is ruined on the 3'^'* toss (but not before)? 

(d) (i) What is the probability that she is ruined on the 4"^ toss, but not before? 

(ii) Which Borel set corresponds to her being ruined on the 4**^ toss? 

(iii) What is the Lebesgue measure of this set? 

We now investigate a little deeper: Let s„(a') be the amount that she has won (or lost, if negative) after n 

71 

tosses if the outcome is oj. Thus Sn(w) = ^ rfc(a') where the rfc(w) are the Rademacher functions, defined 

k=l 

by 

j 1 if Wfe = 1 i.e. fc* toss lands Heads 
rfc(a;) = ■! 

I — 1 if Wfc = i.e. k toss lands Tails 



66 



Some Probability Theory 



(e) Let N be an integer. Explain why {a; € (0, 1] : s„(a;) = N} is a Borel set. What is the measure of this 
set? 

(f) Let N be an integer. Explain why {oj £ (0, 1] : Sn(w) > N} is a Borel set. Conclude that each s„ is a 
measurable function (i.e. a random variable). 

(g) Explain why the set 

R„ = {oje (0, 1] : Sfc(a;) > -2 for 1 < fc < n, s„(a;) = -2} 
is a Borel set. What event does it describe? 

oo 

(h) What event is described by the set Ji = |J R„? 

n=l 

□ 

4.5 Some Probability Theory 

Probability is all about information. I toss a coin and see that it lands heads. You don't see 
the coin. For you the probability that the coin has landed heads is ^, but for me it is 1. New 
information changes the probability measures. 

Let (0,J^, P) be a probability space, and let A,B be events. Knowledge that B has 
occurred can change our estimation of the probability that A has occurred. We write P(A|B) 
for the probability that A occurs given that we know that B has occurred. We call P(A|S) 
the conditional probability of A given B. 

Example 4.5.1 A die is rolled. Lot A be the event that the outcome is a 6, let B be the event that the 
outcome is am even number, and let C be the event that the outcome is an odd number. Clearly f'{A) = |. 
However, if we know for sure that the outcome is an even number, then the probability of getting a 6 is |, i.e. 
P(j4|S) = |. In the same way, if B occurs, then C cannot possibly occur, so although P(C7) = |, P(C|B) = 0. 

□ 



Basically, what's happening here is that wc have to modify our probability m,easure to 
accommodate the "new" information that B has occurred. If P( \B) is the new probability 
measure on (fi,^), then we must have ¥{B\B) = 1 and ¥{0, — B\B) = 0. If ^ is another 
event, then A occurs if and only if ^ PI -B occurs, since we know that B also occurs, and 
it makes sense to assume that the new probability that A occurs is proportional to the old 
probability that Af] B occurs, i.e. that P(yl|i?) = c¥{A n B) for some constant c. To ensure 
F{B\B) = 1, we must have c = F{B)-^. We therefore find that 

^(^1^) = ^bT 

the standard formula given in elementary probability theory texts. 

Exercises 4.5.2 (1.) Prove that P( \B) is a probability measure on 
(2.) For events Ai, . . . , An, prove that 

li n • • • n A„) = F{Ai)F{A2\Ai)F{A3\Ai n A2) . . .P(yl„|^i n • • • n An-i) 

□ 



Measure Spaces 



67 



Example 4.5.3 A couple has two children. Assuming that boys and girls are equally likely, and given that 
one of the children is a girl, what is the probability that the other child is also a girl? 
We can model this probability space as follows: 

Q = {BB,BG,GB,GG} 

T = v{n) 

Let B be the event that at least one of the children is a girl, i.e. B = {GB, BG, GG}, and let A be the event 
that both children are girls, i.e. A = {GG}. Then 

^ ' ' W{B) 3/4 3 

□ 



Two events A, B are said to be independent if knowledge of B tells us nothing about A, 
and vice versa. By this we mean that our estimation of the probability that A occurs isn't 
changed by the knowledge that B has occurred. Thus: 

and hence we have the multiplication law 



r\B) = ¥{A) ■ ¥{B) 

The above equation gives us the definition of independent events: 

Definition 4.5.4 Let (O, .F, P) be a probability space. A (possibly infinite) set A = {Ai : 
z G /} of events is said to be an independent family provided that for any distinct 

11,12, ...,in& I 



F{Ai, n ^2 n • • • n = F{Ai^)F{Ai,) . . .p(AJ 



Example 4.5.5 (a) Consider the random trial of tossing coin twice. The sample space f2 is the 4 clement 
set {HH, HT, TH, TT} and the associated a-algebra is just P{^)- Intuitively, if the coin is fair, the 
outcome of the first coin should have no influence on the second. Thus knowing that the first coin has 
landed heads should make no difference to whether the second coin lands heads. Let B — {HH,HT} be 
the event that the first coin lands heads, and let A = {HH,TH} be the event that the second coin lands 
heads. Then P{A n B) = ¥{{HH}) = j, and ¥{A) ■ P(B) = i • | = i. Thus F{A n B) = P(A) • P(B), i.e. 
the events A and B arc indeed independent. 

(b) Consider the same experiment as in (a), but with one important difference: Before the experiment starts, 
we are told that the coin is unfair. It has either two heads, or two tails, but we are not told which. Each 

possibility is equally likely. 

To model this, we use a different probability measure Q, which has 

Q{{HH}) = ^= Q{{TT}) Q{{HT}) = = Q{{TH}) 
In this case Q{A n B) = |, whereas Q(A)Q(S) = |. Thus A and B are not independent under Q. 



□ 



68 



Some Probability Theory 



It's worth pointing out once more that it depends on the probabiUty measure whether 
or not two events are independent, i.e. it is possible for events to be independent under 
one measure, but not under another. The notion of independence is therefore a probabihstic 
notion, which has no analogue in general measure theory. 

The intuitive idea about independence was the following: Two events are independent if 
the information that one of the events has occurred does not tell us anything new about the 
other, i.e. it does not lead us to revise our estimation of its probability. Now cr-algebras are 
the carriers of information, and we would therefore like a definition of independence which 
involves cr-algebras. We therefore define independence anew: 

Definition 4.5.6 Let (n,J^,F) be a probability space. Sub-cr-algebras Qi,Q2,... of 
are said to be independent if events in distinct Qn are independent, i.e. whenever 
m, n2, . . . Urn G N are distinct positive integers and € Gni,Gn2 ^Qni-,---, Gnm. ^ Qum 
are events, we have 

nGn,r^Gn,c^■■■f^GnJ= \{nGn,) 

k<m 

The basic idea is that two a-algebras are independent if there is no information about an 
event in one of the cr-algebras that would lead us to revise our estimate of the probability of 
any event in the other cr-algebra. 

Example 4.5.7 Suppose that A, B are events in some probability space (Q, T, P). Then A = {n, A, A", 0} 
and B = {il, B, B'^, 0} are the cr-algcbras of events that can be decided by knowledge of A, B respectively. It 
is easy to show that A, B are independent events if and only if A, B are independent cr-algebras. For example, 
F{A) = F{A nB)+ F{A n B"), and thus P(A n B") = ¥{A)[1 - P(B)] = P(^)P(B'=), by independence of 
A, B. It follows that A, B° are independent if A, B are. The other combinations of events are similarly proven 
independent. 

□ 

Remarks 4.5.8 Can an event be independent of itself, i.e. given an event A, can the events A, A be 
independent? Here we have to be a little careful. From the intuitive point of view, the answer would seem to 
be no, since the information that the event A has occurred will certainly make us re-evaluate our estimation 
of the probability that A has occurred! However, if we look at the definition, A and A will be independent 
provided P(^ O A) = P(^)P(A), i.e. provided P(A) = ^(Af . This can happen only if ¥{A) is either or 1. 
That's not too far removed from our intuition. If T{A) = 1, for example, then A happens almost surely, so 
telling us that A has happened does not really give us any information. We were practically certain that it 
would anyway. 

□ 

Exercises 4.5.9 (1.) A gambling game involves the rolling of a fair die followed by the flipping of a fair 
coin. 

(a) Set up a reasonable probability space to model this situation. 

(b) Let A be the event that the die lands on an even number, and let B be the event that the coin lands 
tails. Show that A and B are independent events. 

(2.) An HIV-tcst is 95% accurate, i.e. it gives the correct result 95% of the time. John lives in a small town 
with a 1000 inhabitants, 50 of whom have AIDS. After a particularly wild night, John decides to have an 
HIV-test, which comes out positive. What is the probability that John has AIDS? 



□ 



Measure Spaces 



69 



Let (n, P) be a probability space, and let A = {An : n G N} be a countable set of 
events. It may help to think of .4 as a sequence of events, ^n+i following An- 



Proposition 


4.5.10 (First Borel-Cantelli Lemma) 


Let {n,J^,F) 


be a probability space, and let {An : n G N} be a sequence of events. If 




n 


then 






P(A„,i.o.) = 


oo 



Proof: Let Bn = [j A^. Then S„ J, limsup^„ = (^„,i.o.). Hence 

k=n 



P(^„,i.o.) <P(i?„) <Y,P{An) 

k=n 

for all n G N. Now as n — oo, the right-hand sum goes to zero, since X^„P(^n) converges. 
Hence P(^„,i.o.) = 0. 

H 



Proposition 4.5.11 (Second Borcl-CantcUi Lemma) 

Let ($7, be a probability space, and let {An : n G N} 6e a family of independent events. 
If 

^P(A„) = oo 

n 

then 

F{An,i.o.) = 1 



Proof: The proof depends on the fact that 1 — x < e ^ for all a; G M, an inequality which is 
easily proved using first-year calculus. Now clearly (A„,i.o.) = limsup^„ = (Hminf A^)*^ = 

oo oo 

(^^,ev.)^, and thus it suffices to prove that P(A^,ev.) = 0. But (A^,ev.) = |J fl ^fc by 

n=l k=n 

oo 

definition, and so it suffices to show that P( P| A^^) = for all n. Now by independence of 

k=n 

the An, and thus the A^, we have 

n+rn n+m n+m — "v^PM "i 

n n = n [1 - ^(^«)] ^ n ^""^^-^ = ^ 

k=n k=n k=n 

for all m G N. Now since P(^n) diverges, the power of e on the right must tend to zero 

oo n+m 

as m ^ OO. Thus P( fl Al) = lim P( fl Al) = 0, as required. 

k=n k=n 

H 



70 



Extension of Measures 



Remarks 4.5.12 The First Borel-Cantclli Lemma says that given events An, not necessarily independent, 
if the sum of the probabihties P(^n) converges, then (A„,i.o.) is an event of zero probability. The Second 
Borel-Cantelh Lemma says that if the are independent and the sum of the probabilities P(An) diverges, 
then the event (^„,i.o.) occurs almost surely, i.e. with probability 1. Thus for independent events An, there 
is no middle road: {An, i-o) is either an event of probability or an event of probability 1. 

□ 

Exercise 4.5.13 It is sometimes asserted that if a monkey hits the keys of a type writer at random, it 
would eventually produce, in one continuous stream, the complete works of William Shakespeare. Prove it. 

□ 

4.6 Extension of Measures* 
4.6.1 Other Families of Sets* 

It turns out that o"-algebras can be quite complicated to deal with, especially if the sample 
space is finite. In many cases, it is easier to work with simpler classes of sets, especially 
TT -systems. 

Definition 4.6.1 Let C be a collection of subsets of $7 

(a) C is called a 7r-system if it is closed under finite intersections. 

(b) C is called a A-system if 

(i) neC; 

(ii) A,B eC and AC B implies B - A e C; 

(iii) If ^1, ^2, • • • G C and An T A then AeC. 

(c) We denote by 7r(C) and A(C) the tt-, respectively, A-system generated by C, i.e. the 
smallest tt-, respectively, A-system on ft which contains C. 

Why do vr(C), A(C) always exist? It follows from the easily proved fact that the intersection 
of an arbitrary family of vr-systems (resp. A-systems) is again a vr-system (resp. A-system). 

Proposition 4.6.2 A family C of subsets of fl is a a-algebra iff it is both a ir-system 

and a \- system. 

Proof: It is clear that a cj-algcbra is also a vr-system and a A-system. 

Conversely, suppose C is both a tt- and a A-system. Then C is closed under complemen- 
tation, by (i), (ii) of Defn. 8.1.1(b). De Morgan's Laws applied to Defn. 8.1.1(a) show that 
C is closed under finite unions. Finally, given Ai, A2, ■ ■ ■ ^ C , let A = An. Define 

Bn = Ayn 

Then each Bn G C, and Bn T ^- Hence ^ G C, by Defn. 8.1.1(b)(iii), and thus C is closed 
under countable unions. 



Measure Spaces 



71 



The following technical result often allows us to work with "easy" 7r-systems, instead of 
the "difficult" a-algebras: 

Theorem 4.6.3 (Dynkin's Lemma, Monotone Class Theorem) 

(a) If C is a it -system on then 

\{C) = a{C) 

(b) Suppose that C is a -system and that V is a X-system (both on a set Q), and also 
that CCV. Then a{C) C V. 

Proof: (a) Let V = X{C). By Propn. 8.1.2, it suffices to show that P is a vr-system. We do 

this in two steps. 

STEP L Fix C G C, and define 

Vc = {AeV : Anc eV} 

Then C C Dq ^ (because C is a vr-system). We now show that Dc = To that end, it 
suffices to show that Vc is a A-system (because then Vq is a A-system containing C, and V 
is the smallest such). We therefore verify (i)-(iii) of Defn. 8.1.1: 
(i) is obvious. 

If A,B € Vc and ACB, then {B - A) n C = {B n C) - (An C). But B n C, An C e V hy 
definition of Vc, and thus {B — A) nC &V, because D is a A-system. Thus {B — A) & Vc- 
Finally, if Ai,A2, ■ ■ ■ e Vc and An ] A, then Ai n C, n C, ■ • • G P and (A^ n C) t -4 n C. 
Hence AnC £V, and so AgVc- 

We now know that Vc = V for every C E C. 
STEP II: Now, fix any D eV, and define 

v^ = {Aev : AnD ev} 

First note that if C e C, then Vc = V,so D e Vc- It follows that D n C G P, and thus that 

C e V^, for every C G C. Thus C C V^, for all D (EV. 

It follows as above that V^ is a A-system, and thus that V^ = V, for all D E V. 

In particular, if ^, i? G V, then A G V^, and so An B E V. This shows that I? is a 
TT-system, and thus a a-algcbra. 

(b) follows directly from (a). (Why?) 

H 

4.6.2 The Extension Theorem* 

This section is mainly technical: The Caratheodory Extension Theorem (Thm. 4.6.9) ensures 
that commonly used measures, such as Lebesgue measure, exist Before we tackle that, here 
is an easy but useful result: Two probability measures which agree on a 7r-system agree on 
the (7-algebra generated by that vr-system. 

Proposition 4.6.4 Suppose that ^i, fi2 are finite measures on a measurable space (0,,J^), 
and let C be a t: -system such that $7 G C and cr{C) = T . Then //i = \i2 iff /^i, M2 agree on 
C. 



72 



Extension of Measures 



Exercise 4.6.5 Prove the preceding proposition. 

[Hint: Show that V = {A £ : niA = /i2^} is a A-system, and that C C I?. Then apply Dynkin's Lemma 
(Thm. 8.1.3)] 

□ 



Definition 4.6.6 Let 17 be a non-empty set. 

(a) A map /x : V{^) — > [0, +oo] is said to be an outer measure on Q, iff: 

(i) = 0; 

(ii) fi is monotone: AC. B implies fiA < jiB; 

(iii) ji is countably sub-additive: If A\,A2, •••CO, then /u([Jjj An) < At^ln- 

(b) A set ^ C O is said to be fi-measurahle if and only if 

IjlE = ijl{E n A) + ijl{E n A'') for aWECQ. 

Note that wc require jJbA to be defined for every subset AC.^. We haven't mentioned a 
base (7-algebra. But there is one: 

Theorem 4.6.7 Let fi be an outer m,easure on Q and let M{fJ,) be the family of all fx- 
measurable sets. Then (O, A^(/i), /x|A1(/Lt)) is a measure space. 

Proof: We must show that Mi/j) is a cr-algebra, and that n is countably additive on 

Certainly is ^u-measurable. Also, it is obvious that A G A^(/^) implies A^ G 
Next, we show that Mifi) is closed under finite intersections: Let A,B G M.{fi), and let 
E cn. Then 

tJ,E = n{E r\A) + fj,{E n A") 

= n{E n An B) + n{E n An b") + ijl{e n A") 
= n{E n{An B)) + ii{E n{An By) 
> i^E 

where the third line follows because /i(£;n(^nS)'') = fi{En{AnB)''nA)+fi{En{AnB)''nA'') = 
Ijl{E nBn A'^) + n{E n A^)^ and the final line holds because /x is sub-additive. It follows that 
An B & M.{ijl). Hence M.{ij) is an algebra. 

Next, let A,B & M{ix) be disjoint, and let E CO.. Then B C A'^, and hence 

ii{E n (A u B)) = ii{E n{Avj B)) nA) + ^{E n{AvjB)n A") = pi{E nA) + pi{E n B) (*) 

Specializing to E = Q proves that is finitely additive on M (/x) . 

Now let Ai, A2, ■ ■ ■ be a countable sequence of disjoint elements of Ai{jj,), and let ECO,. 
Define Bn = Um<n ^^'^ ^ ~ Un = Un Then by monotonicity and (*), we see 
that 

fi{E nA)> fi{E nBn)=J2 ^ ^^n) 

m<n 

As n — 00, and invoking the fact that /x is subadditive, we see that 

n{EnA)>^lxAn>n{A)>n{EnA) (**) 

n 



Measure Spaces 



73 



so that equality holds throughout. Countable additivity of fi on M{fi*) is then obtained by 
specializing to the case E = 0,. 

Also, since Bn G -M(a*), we see, by monotonicity and (**), that 

HE = fi{E n Bn) + KE n B^J 

>^n{En An) + fi{E n A^) 
ii{E r\A) + ii{E n A") 

> fiE 

Hence equality holds throughout, and so A e M{fi). This proves that M.{iJ,) is a cr-algebra. 

H 

Here is one of the most important ways of obtaining outer measures: 



Proposition 4.6.8 Let A be an algebra on a set il, and let iiq be a non-negative countably 
additive set function on A. Define (i* : V{fl) — > [0, +oo] by 

li*E = inf I /xo^n • (^n)n 0- Sequence of sets in A with E C.\^ 

n n 

Then fi* is an outer measure on which extends /xq. 
Moreover, ^ C (//*). 

Proof: To see that ^* extends is easy: For let A £ A. Firstly, if {An)n is a sequence in A 
which covers A, then fi^A < J2n Mo^n (by countable additivity of /xq), and thus /iqA < ji* A. 
The sequence {A, 0,0,...) witnesses the fact that ijl*A < hqA as well. 
Next, we prove that /x* is an outer measure on f2. 

(i) ^*0 = ^00 = 0; 

(ii) If £^ C F, then any ^-covering of F is also a covering of E, and thus IJ*E < ii*F. 

(iii) Finally, suppose that C fi, and let E = En- We must show that 

IX*E<^IX*En (*) 
n 

Fix e > 0. For each choose a sequence (^n,fe)fe iii such that 

l^oAn,k < fJ-*En + £2-" 

k 

Then {^n,fe : n, A; G N} is a countable ^-covering of E, and thus 

n,k n k n n 

Since e is arbitrary (> 0), (*) follows. 



74 



Extension of Measures 



Next, we show that every member of A is /i* -measurable. So let B € A^(/i*), and let E 
be arbitrary. Let £ > 0. Choose a sequence {An)n in A which covers E, and such that 

^IJ'oAn < fi*E + e 

n 

Then {An n B)n, {An n -B'^)„ are, respectively, sequences in A which cover E D B and EDB'^ 
It follows that 

fi*{E r)B) + fi*{En s^) < J2 {MAu n 5) + no{An n S^)) = ^ <n*E + e 

n n 

H 

Since e > is arbitrary, it follows that 

ti*{EnB)+fi*{EnB'')= fi*E forallE'CJl 

(we proved <, but > always holds, by sub-additivity.) Hence B is /x* -measurable, for every 
Be A. 



Theorem 4.6.9 (Caratheodory's Extension Theorem) 

Let A be a an algebra on a set ft, and let jiQ be a countably additive non-negative set 
function"- on A, such that fiQ^ = 0. Then fiQ extends to a measure fj, on (t{A), i.e. there 

is a measure ji on (^7, cf{A)) such that fi^A = fiA for all A e A. 
Moreover, if is a-finite, then the extension fj, of /xq is unique. 

"This means that if Ai,A2, - ■ ■ £ A are mutually disjoint, and if also |Jn ^ then Ho{[j„ An) = 
En 1^0 A„. 



Proof: Let = a{A), and define /x* on {n,V{n)) by 

fj*E = inf I /xo^n '■ (^n)n a sequence of sets in A with .B C 

n n 

Then by Propn. 4.6.8, /x* is an outer measure which extends /xq. By Theorem 4.6.7, /x* is 

a measure on the cr-algebra of all /u*-measurable sets, and A C M.{fi*). Hence also 

a{A) = T ^ M{iJ.*), and jj. = /Li*|jr is a measure on (Jl, J^). 

We now turn to the uniqueness of the extension. First suppose that /xq^^ < oo, and 
suppose that u is another extension of fiQ to J^. Since CI e A, we have uCl = /xf2. We must 
show that /xF = uF for all F e J^. Let {An)n be a ^-cover for F. then 

vF <^ uAn = ^ noAn 

n n 

because ulA = /xq. Since {An)n was an arbitrary ^-cover of F, we must have vF < ii*F = ^F, 
for all F G .F. Thus also uF'^ < fiF'^, and hence 

i/n = uF + vF" < fiF + iiF" = /xQ = I/O 

It is now easy to see that uF = /xF, for all F e T . 

Finally, assume that /xq is cr-finite on (fi,.4). then there exists a sequence An ^ A 
such that An \ J^, and such that [xAn < oo (for all n G N). As above, we can prove that 
^{F n An) = y{F n An), for all n G N, and using Propn. 4.3.4, we see that uF = jjlF in this 
case also. 

H 



Measure Spaces 



75 



4.6.3 Completion of Measure Spaces* 

Suppose that is a measure space. It may be possible to extend the measure to a 

class of sets larger than where the measure of the added new sets is determined by /x and 
jr. For example, suppose that 

(i) F (^T is such that jiF = 0; 

(ii) A(^F 

then "clearly" j^iA = also. However, \{ A ^ then ^lA isn't defined. Yet, ^lA "ought" to 
be zero. By adding all those sets whose measure "ought" to be zero, we get a new cr-algebra 
called the completion of w.r.t /i. 

Suppose that ($1,.^, /x) is a measure space, and let /U* be the outer measure generated by 
jjL (cf. Propn. 4.6.8), i.e. 

^fE = inf I ^ fxFn : {Fn)n a sequence of sets in JT with C F„| 

n n 

Because we are now dealing with a cr-algebra J^, rather than an algebra A (as in Propn. 
4.6.8) we actually have: 

Proposition 4.6.10 

H*E = mm{nF : F e T A E C F} 

a_ 

Exercise 4.6.11 Prove Propn. 4.6.10. 

[Hint: Note first that if £ C F„, where each F„ € T, and \i F = F„, then 

n 

Conclude that 

l^*E = inf{/iF -.ECFsJ^} 
Now choose for each m G N a Gm G ^ such that E C Gm and /xGm < fJ'*E + ^. (Why can we do this?) Put 
G = Gm, and show that /xG = n*E.] 

□ 

The following lemma is obvious: 

Lemma 4.6.12 Let {Q^^J^,^) be a measure space, and let ji* be the outer measure gener- 
ated by ^ (via Propn. 4 -6 -8). Then every fi-null set belongs to (//*). 

Definition and Proposition 4.6.13 Let (Jl, J^, /x) be a measure space. Let 

Af := {N cn:3F e = A C F]} 

be the set of fi-null sets (cf. Definition 6.3.1). Then 

f := {FU N : F e T,N e Af]} 
is a u-algebra, called the completion of w.r.t. ji. 

Moreover, if fj,* is the outer measure generated by n (via Propn. 4-6-8), then T C A1(/x*), 

and ii*\J- is a measure on {^,^) which extends /i. 

Finally, if {i^,J^,iJ,) is a a-finite measure space, then T = A^(//*). 



76 



Lebesgue Measure 



Proof: Wc first show that ^ is a cr-algebra. That !F is closed under countable unions follows 
straightforwardly from the fact that both and M are closed under countable unions. To 
check that T is closed under complementation, suppose that FUN G T, where F e N e Af. 
Choose GeJ^ such that fiG = and N C G. Then 



Now {F\JGY G J^, and G- (FUiV) G a/" (being a subset of G). Hence {F{JNf G proving 
that is a c-algebra. 

Clearly = a{TUAf). Since is a a-algebra which includes both JT and M (by 

Thm. 4.6.7, Propn. 4.6.8 and Lemma 4.6.12) it follows that C M{ij*). Since n*\M{n*) is 
a measure (cf. Thm. 4.6.7), so is //*|/". 

Finally, we show that if {0,,J^,iJ,) is cr-finite, then also C J^. Choose G„ G J- such 

that = IJ^Gji) with < oo for all n G N, and let A G (//*). It suffices to show that 
each AnGn & f^- Hence we may assume that /x is a finite measure. By Propn. 4.6.10, there 
is F G such that A C F„ and n*A = jiF. Then 

li*{A) = iiF = i^*F = ii*{F n A) + ii*{F n A") = i^*A + fi*{F- A) 

Hence iJ,*{F — A) = (where we use the fact that is finite). It follows, again by Propn. 
4.6.10, that F - ^ is a is a /x-null set. Thus A = F - {F - A) E ^. 



It is easy to see that J-' = a{J-' D M), the smallest cj-algebra which contains each member 
of and also all the null sets. 

Exercise 4.6.14 Show that if {S,T,^) has completion {S,:F,)j,), then 



[Hint: Let Af be the family of null sets, let G ^ {A ^ S : AAF G J\f for some F e J^}, and let .F = a{J^UAf) = 
{FU N : F € J^, N e M]}. First show that T,JV QG, and conclude that ^ QG. Next, note that if AAF € U 
for some F £ JT, then A = {F - {F - A)) \J {A - F), where F&J^&ndF-A,A-F&M. 



4.7 Lebesgue Measure 
4.7.1 Lebesgue measure on R 

In this section, we construct the natural notion of lengthy namely Lebesgue measure on M. 

First, we define the Lebesgue outer measure A* : V{M) [0,+oo]. (cf. Defn. 4.6.6 for 
definition of outer measure.) Given an interval 7 C M, define |/| to be the length of the 
interval. Then, for ^ C M, define 



(F U Nf ={FU Gy U [G - (F U N)] 



J^={ACS : AAF is a /i-null set, for some F e J^} 



□ 



oo 




n=l 



n 



Measure Spaces 



77 



RertiEirks 4.7.1 Note that we also have 

oo 

A* (A) = inf { \In\ ■ each /„ a finite open interval, AC \^In} 

n — 1 n 

For let XA = inf { JUJ^Li ■ each 7„ a finite open interval,^ C /„}. Then clearly A*^ < \A. To prove 
the reverse inequality, fix ^ C K, and choose intervals 7„ such that J]„ \In\ < A*^ + |. If |/„| = +00 for some 
n G N, then clearly X* A = +00, in which case A^ — XA* (i.e. there is nothing to prove). Ifencc we may assume 
that each In is a finite interval. Now let Jn be an open interval such that /„ C J„ and \ Jn\ < \In\ + e2~^~^ . 
Then each A^ < | Jn| < X)„ |/n| + § < X* A-\- e. Letting e J. 0, we see that XA < X* A, as required. 

□ 



Proposition 4.7.2 A* is an outer measure on M, and \*I = \I\ for every interval I. 



Proof: It is clear that A* is a monotone increasing non-negative function with A*(0) = 0. To 

prove that A* is also countably sub-additive, let Ai^ A2, • • • C R, and fix an arbitrary e > 0. 
By definition of A* we may, for each n G N, choose open intervals In,i,In,2, ■ ■ ■ such that 

An^\J In,k l^^.fcl ^ + all n G N 



Then 



Since e was arbitrary (> 0), it follows that A*(|J^ A„) < X]^A*A„. 

It remains to show that X*I = \I\ for every interval / CM. It is obvious that we always 
have X*I < To prove the reverse inequality, first assume that / is a compact interval (i.e. 
/ = [a, b], for some —00 < a < b < +00). Choose finite open intervals (04, bi), (0,2, 62), . . . 
such that [a, 6] C IJ„(an,6n) — cf. Remarks 4.7.1. By compactness, there is n such that 

n 

[a, 6] C |J(afc,6fc) 
k=l 

We now check by induction on n that b — a < J2k=ii^k — o.k)- This is obvious if n = 1. 
Now suppose that the assertion is true for n — 1, and let (ai, 61), ... , (a„, bn) be a cover for 
[a,b]. Without loss of generality (relabelling if necessary), we may assume that b € {an,bn)- 
Then (ai, 61), ... , (0^-1, 6„_i) is a cover of the interval [a, an]- By induction hypothesis, we 
have a„ — a < Yl]i=ii^k — o-k), SLiid hence 

n— 1 n 

b- a = {b- an) + {an- a) < {bn - an) + '^{h - ak) = ^{h - a-k) 

k=l k=l 

as required. 

It follows that b — a < Yl^=ii^k — o,k)- Since the (a„, 6„) were an arbitrary open covering 
of [a, 5], it follows that 6 — a < A* [a, 6], i.e. that \I\ < X*I for every compact interval /. 

It remains to deal with intervals that are non-compact. If 7 is a bounded interval, then 
there is a compact interval J such that J ^ I and \I\ < | J| + e (where we fix e > 0). It 
follows that |/| < A*J + e. Now since also A* J < A*/, we must have |/| < A*J + £ < X*I + £. 
Letting e | 0, we see that |I| < A*/ if / is a bounded interval. 



78 



Lebesgue Measure 



Finally, if I is an unbounded interval, then X*I = +00: Indeed, if if > is arbitrary, 
there is a compact interval J Q I such that \J\> K. Then 

A*/ >X*J>K 

Letting iiT | 00, we see that X*I = +00 = |/| when I is unbounded. 

-i 

Now that we have constructed an outer measure A*, it follows by Thm. 4.6.7 that there 
is a (T-algebra >C(M) on M such that A = A*|>C(M) is a measure on (M, >C(M)). Indeed, we 
have jC{R) := M.{\*), the family of all A*-measurable sets. The cr-algebra JC{M.) is called the 
a-algebra of all Lebesgue measurable sets, or the Lebesgue algebra, on M. 

Our next aim is to show that B{M.) C >C(M), i.e. that every Borel set is Lebesgue measur- 
able. 



Proposition 4.7.3 Every Borel set is Lebesgue measurable. 



Proof: It suffices to prove that every interval of the form (—00, a] is Lebesgue measurable, 
because the collection of intervals of this form generates ;B(M). Let iiJ C M be arbitrary, and 
let / = (—00, a]. Fix e > 0, and choose intervals Ii,l2, ■ ■ ■ such that |/„| < X*E + e. Note 
that if J is an arbitrary interval, then so are J PI J, J PI P, and | J| = | J fl /| + | J n /'^l. 
Now 

X*E + e>Y,\In\ 

n 

= J2\inni\ + J2 \in n r I 

n n 

> x*{Eni) + x*{Enr) 

Letting e i 0, we see that X*E >X*{EnI) + X*{E n I"), for all E CR. 

H 

We have almost proved the following theorem: 

Theorem 4.7.4 There exists a unique measure X on M, ;S(M)) such that XI = \I\ for every 
interval I. 

Proof: By Propn. 4.7.2, A* is an outer measure with X*I = \I\ for every interval, and thus 
it follows by Thm. 4.6.7 that there is a a-algcbra £(M) on M such that A = A*|>C(M) is a 
measure on (M,£(M)). By Propn. 4.7.3, B{R) C £(M). 

It remains to prove uniqueness: Suppose that fi is another measure on (M,;B(M)) with the 
property that //I = |/| for all intervals I. Let = [— n, n]. Then A„ := A|/„,f„ := are 
finite measures on {In,B{In))- Now C„ = { J : J an interval, J C /„} is a 7r-system which 
generates B{In), and Xn,i^n agree on Cn- Hence, by Propn. 4.6.4, Xn, Hn agree on B{In)- 

Now, a B e B{R), then by Propn. 4.3.4, 

XB = lim XJB n In) = lim /iJB n /„) = /x5 

n-+oo n— »oo 

H 



Measure Spaces 



79 



Remarks 4.7.5 It is easy to sec that Lebesgue measure can be extended to the extended real number 
system K = [—00, +00]. The Borel algebra on E is generated by all sets of the form 

[—00, a) (a,,b) 00] a, b G K 

(which forms a base for the topology on M). Thus Z3(K) is the family of all sets of the form 

B BU{+oo} Bu{-oo} BU{+oo, -cxo} 

where B is an ordinary Borel set. 

□ 

4.7.2 Lebesgue Measure on M*^ 

It is possible to modify the construction of Lebesgue measure on M to obtain a unique measure 
on {W^, B{R'^), also denoted by A, which assigns to every rectangle its volume: A rectangle in 
is a set 

R = Ii X I2 X ■ ■ ■ X Id where Ii, I2, ■ ■ ■ , Id arc intervals in M 

The volume of such a rectangle is defined by vol(i?) = x I/2I X • • • x 
We can then define a map A* : V{R'^) [0, +00] by 

00 

X*{A) = inf { ^ vol{Rn) ■ each Rn a rectangle, ^ C 

n=l n 

and show that A* is an outer measure, and that every Borel set is A*-measurable. 

Instead of performing the construction outlined above, we elect to wait until we have 
constructed products of measure spaces: Given measure spaces (fij, J-i, jij), where i = 1, . . . , n, 
it is possible to construct, in a canonical way, a measure space 

i<n i<n i 

It will turn out that 

i<d i<d 

where A'^ denotes the d-dimensional Lebesgue measure. 

Thus it is possible to construct (M'^, H(M'*), A'') directly from (M, B{R), A), and a repetition 
of the construction of Lebesgue measure proves unnecessary. 



80 Lebesgue Measure 



Chapter 5 

Measurable Functions and Random 
Variables 

5.1 Definition of Measurable Function 

Let f : A ^ Shea map between two sets. Recall that / induces a set map : P{S) — V{A) 
between the power sets — in the opposite direction — by 

f-\T] = {aeA:f{a)eT} 

Remarks 5.1.1 Here is some motivation for the definition of measurable function. 

Suppose that X is a random variable on a probabihty spaee (n,JF, P), i.e. a function X : 51 ^ K which 
assigns a number X{u}) to every outcome to £ fl — we will make this notion more precise shortly. We would 
like to be able to discuss the probability that X = 0, or that X lies between —1 and 1, etc. Thus we'd like to 
know 

¥{X = 0) — ¥{{ij G n : X{lj) = 0}) = P(X-^{0}) 

]P(-i < ^ < 1) — ¥{{uj en-. -i< x{uj) < 1}) = P(x^[-i, 1]) 

However, ¥{F) makes sense only ii F £ J^. Thus, in order to be able to discuss the above probabilities, it is 
necessary that the sets 

x-^{o} := {w e n : = 0} x-^ [-1, 1] = {w e n : X{u) € [-1, 1]} 

belong to J^. 

More generally, given a Borel set B, we want to be able to discuss the probability that the outcome X{ijj) 
belongs to B. For 

P(X € B) := P({a; G Q. : X{uj) € B}) 
to make sense, it is necessary that the set 

X-^B = {w e fi : X{oj) € B} 

belongs to 

Thus: We can only meaningfully discuss the possible values of the random variable X in a probabilistic 
setting if X~^B £ T for every B £ B(K), i.e. it is necessary that 

X"^S(R) C 

□ 

We begin with a little set theory: 



81 



82 



Lebesgue Measure 



Proposition 5.1.2 //r,r„ C S, then 

(a) /-^n = if-'mr 

(b) f-'[[JnTn]=[jnf-'[Tn] 

(C) f-'[nnTn] = f]nr'[Tn] 

li f : A ^ S, and 5 is a family of subsets of S, we denote by f~^S the family of subsets 
of A defined by 

f-'S = {f-\T]:TeS} 

Proposition 5.1.3 Suppose that {A, A), {S,S) are measurable spaces, and that f : A^ S 
is a map. Then 

(i) A! = f~^S is a u-algebra on A. 

(a) S' = {T C S : f~^[T] G A} is a a -algebra on S. 

Exercise 5.1.4 Prove Propn. 5.1.2 and 5.1.3. 

□ 



Definition 5.1.5 (1.) Let {A, A), {S,S) be measurable spaces. A map ^ 5 is said to 
be A/ S -measurable if and only if f-^S C A (i.e. /"^[T] G A for all T G S). 
If the (T-algebras A, S are obvious from context, we simply call / a measurable func- 
tion. 

(2.) A measurable function X from a probability space (O,^) to (M,H(M)) is called a 

random variable. 

A measurable function X from a probability space (^i, J^) to (M'*, ;S(M'^)) is called a 
random vector. 

More generally, any measurable function from a probability space to a measurable 
space is called a random element. 

f 

(3.) If 5 is a topological space, then a measurable function (5, B{S)) —>■ (M, ;S(M)) is called 
a Borel function. 

We are usually interested in the case where S = R or W^. 

Thus we have the following pullback condition for measur ability: 

A function is a measurable iff puUbacks of measurable sets 
are measurable. 



Note the similarity with the definition of continuous function: A function / between 
topological spaces X,Y is continuous iff is an open subset of X whenever y is an 

open subset of Y, i.e. iff pullbacks of open sets are open. 

Remarks 5.1.6 (1) The notion of measure docs not occur in the definition of measurable function/random 
variable: Only the measurable spaces ( = set + a-algebra) play a role. 



Measure Spaces 



83 



(2) If {A, A) — > {S, S) is measurable, then 

(3) If X is a random variable on (O, J^) and S C R, we write 

{X 6 B} for X-^B = {w 6 : X(cj) e 5} 

(4) We will also allow extended real-valued maps: ^ — > R, where R = [— oo, +oo]. 



Example 5.1.7 Let {S,S) be a measurable space. For each ACS define the indicator function 

Ia:S — > R by 

' 1 if s e yl 
otherwise 

If A is a measurable set, i.e. A € S, and if B e B(R) is a Borel set, then 

' if neither 0, 1 e S 

Aiile B and ^ B 
A'^ if G B and 1 ^ B 
I, S" if both 0, 1 G B 



□ 



Ia{s) = 



It follows that Ia^[B] G S for every B G B(R), so that 7a is a measurable function. 

Similarly, if Ia is a measurable function, then 7^^{1} = ^4 G <S, since {1} is a Borel set. Thus: 

A is a measurable set if and only if Ia is a measurable function. 



The above example is important enough to restate as a Proposition: 



□ 



Proposition 5.1.8 Suppose that is a measurable space, and that A C. ft. Then 

the indicator 7^ : $7 ^ M is a measurable function iff A is a measurable set (i.e. Ia is 
J='/B{R) -measurable iff A e J^). 

□ 



Exercise 5.1.9 Suppose that is a set, that T := {$, 0} is the trivial cr-algebra on f2, and that G '■= '^(^2) 
is the powerset-algebra on f2. 

(a) Determine all J^/jB(R)-measurable functions fi — » R. 

(b) Determine all ^/jB(R)-measurable functions f2 — » R. 

□ 

Exercise 5.1.10 Suppose that {S,S) are measurable spaces, and that f : —> S is jr/5-measurable. 

(1.) Show that if is a cr-algebra on Q such that C Q, then / is also ^/.S-measurable. 
(2.) Show that if T is a cr-algebra on S such that T C <S, then / is also ^/T-measurable. 

□ 



To check if a function is measurable, it suffices to check the pullback condition on a 
generating set: 



84 



Lebesgue Measure 



Proposition 5.1.11 Suppose that {Q,J^),{S,S) are measurable spaces, and that Q ^ S 
is a map. Suppose further that C is a family of subsets of S such that cr(C) = S. Then f 
is / S -measurable iff f~^C C J^. 

Proof: (^) is obvious: Clearly f'~^C C f~^S, and if / is measurable, then f~^S C ^ by 

definition of measurability. 

(<^): Let T = {T G 5 : f^^[T] G T}. Then C C T, by assumption, and 7" is a a-algebra, by 
Propn. 5.1.2. (Check this!) Hence S = a{C) CT CS i.e. T = S. 

H 



Here are some special cases of Propn. 5.1.11: 


Corollary 5.1.12 A function f : (^i,^) — > (M,i3(M)) is measurable 

conditions holds : 


iff one of the following 


(a) {/ < c} G for all c G M. (Recall that {/ < c} := {u :G 9, 


/H < c}.) 


(b) {/ < c} eJ^ for a//cGM. 




(c) {/ > c} G for all c G M. 




(d) {/ > c} G J" for all c G M. 





Proof: (a) In Propn. 5.1.11, take {S, S) = (M, i3(M)) and C to be the collection of all intervals 
of the form (— oo, c]. We already know that these intervals generate the Borel algebra on M. 
(b),(c),(d) are proved similarly. 



Corollary 5.1.13 If X,S are topological spaces and X ^ S is continuous, then f is 
B{X) /B{S) -measurable. 



Proof: In Propn. 5.1.11, take C to be the collection of all open subsets of M. 



Corollary 5.1.14 Any monotone function from to ^ is a Borel function, i.e. 
B{M.) /i3(M) -measurable. 



Proof: If M — >■ M is monotone, then {/ < c} is an interval (for all c G M), and thus in ;S(M). 

H 



5.2 Combinations of Measurable Functions 

Measurable functions can be combined in a variety of ways to form new measurable functions: 



Measure Spaces 



85 



Proposition 5.2.1 (a) Suppose that f,g : {^,!F) — M are measurable functions and that 
aeR. Then 

f + 9 f a/ f-9 f/g 
are measurable functions, where we assume g ^ on Q for the case f/g. 

(b) If fn '■ {^,^) — K are measurable functions for n G N, then 

sup/n mf/„ limsup/n liminf/„ 
n n 

are measurable. 

(c) If fn'- J^) — M are measurable functions for n G N, and if fn ^ f pointwise on Q,, 
then f is measurable. 



Proof: (a) Suppose that f,g are measurable. First, we show that f + g is measurable. By 
Propn. 5.1.11 it suffices to show that 

{/ + 5 > c} G JT for all c G M 

Now /(s) + g{s) > c iff /(s) > c - g{s) iff f{s) > q> c- g{s) for some g G Q. Thus 

{f + 9>c}=\j[{f>q}n{g>c-q}) 



Now {f > q},{g > c — q} e because f,g are measurable. Since Q is a countable set, 
{f + g > c} £ T also. 

Next, we show that is measurable. This follows easily from Propn. 5.1.11 using the 
fact that 

' {-Vc <f<Vc} if c > 
else 



{f < c} 



To see that af is measurable is easy, e.g. if a > 0, then {af < c} = {/ < ^}. 
Next, to see that fg is measurable, use the polarization identity 

f9=l[{f + 9?-{f-9f] 
Finally, to see that ^ is measurable, it suffices to see that ^ is measurable. But if c > 

{l<c}={{l<9}n{g>0})u[{l>g}n{g<0}) 

Similar arguments work if c < or c = 0. 

(b) Note that 

{sup /„ > c} = [J{fn > c} 

n 

n 

, that inf„/„ = -sup„(-/„), that limsup„/„ = inf„ supjfc>„ /jt and that liminfn/„ = 
-limsup„(-/n). 

(c) Note that if /„ ^ /, then / = limsup„ fn = liminf„ /„. 



86 



Lebesgue Measure 



Exercise 5.2.2 If /, g : {S,S) — > R are measurable functions, then so are 

fV g = max{f,g} f Ag = m.m{f,g} 

because max{/, g} = awp{f,g}, etc. In particular, if / is measurable, so are 

/+ := / V = max{/, 0} /" := -(/ A 0) = max{-/, 0} 

/"*",/" are termed, respectively, the positive and negative parts of / — but note that /~ > 0!. Note that 

/ = /+-/" 1/1 = /+ + /" 

In particular: if / is measurable, then |/| is measurable. 

□ 

Exercise 5.2.3 Suppose that {fn}n is a sequence of measurable functions from a measure space {S,S) to 
R. Prove that the set {s € S : lim„ /n(s) exists} is measurable. 

□ 

Measurability, like continuity, is preserved under composition: 

Proposition 5.2.4 If {A, A) and {S,S) {T,T) are measurable functions, 

then {A, A) ^ (T, 7") is measurable. 

Exercise 5.2.5 Prove Propn. 5.2.4. 

□ 

We have already seen that an indicator function I a is a measurable function iff ^ is a 
measurable set. It follows that from Propn. 5.2.1 that linear combinations of such indicators 
are measurable as well. 

Definition 5.2.6 A measurable function (O, ^) ^ M is called a simple function if ran/ 
is a finite set. 

n_ 

Let {^,J-) be a measurable space. Recall that a finite or countably infinite sequence 
{Fn)n of members of is said to form a partition of O iff (i) The are mutually disjoint 
(i.e. An n Am = when m), and (ii) lj„ Fn = fi. 

Proposition 5.2.7 A measurable function f : — > M is simple iff it is a linear 
combination of measurable indicator functions: 

n 
1=1 

Moreover, the sets Fi E can be chosen to form a partition of ft. 

Proof: It is obvious that a function of the form / = Yl^=i CilPi (where each Fi G !F) is 
simple. / can only take on values which are sums of finitely many of the Cj. 

Suppose now that / is simple, i.e. that ran/ = {ci, . . . , Cn} is a finite set. Define Fi = 
f~^{ci} for i = 1, . . . , n. Then the form a partition of O, and / = Y17=i ^i-^Fi- 



Measure Spaces 



87 



H 

Simple functions play an important part in integration theory. Many important results 
are proved first for simple functions, and then extended to arbitrary measurable functions by 
taking limits. The next proposition is therefore extremely important: 

Proposition 5.2.8 (a) For any non-negative measurable function {0,,J^) M"*" there 
exists a sequence of simple measurable functions fn, n G N such that < /n T /■ 
Moreover, if f is bounded, we can choose the fn so that fn^f uniformly. 

(b) For any measurable function (^i, .F) — > M, there is a sequence of simple measurable 
functions such that /" — ^ /. Moreover, if f is bounded, we can choose the fn so that 
fn^ f uniformly. 

Proof: (a) Define 

fn{s) := 2-[2"/(s)]An 
where [x] is the greatest integer < x. This elegant definition deciphers as follows: 

fn ■= ^ ^-^{fc2-"</<(fe+l)2-"} A n 
k=0 

which means that 

If /(s) < n, then /„(s) = ^ exactly when ^ < f{s) < ^ 
If f{s) > n, then /„(s) = n 

Thus fn is simple and non-negative. Moreover, < f{s) — < 2~". 

Next, we show that {fn)n is increasing sequence. If s G S, then there is a imique 
men such that m2-("+i) < /(s) < (m + l)2-("+i), i.e. fn+i = m2-("+i). If m is 
even, there is A; G N such that m = 2k, in which case A;2~'* < /(s) < {k + 1)2"", i.e. 
fn{s) = k2~'^ = /n+i(s). If m is odd, there is /c G N such that 2k + 1, in which case 
A;2-" < {2k + l)2'(^'+i) < f{s) < {k + 1)2"", i.e. /„(s) = A:2-" < {2k + l)2-('^+i) = 
Thus, whether m is even or odd, fn{s) < fn+i{s). 

Next, we show that fn{s) — f{s) for all s G 5. If f{s) = +oo, then fn{s) = n for all 
n G N, so certainly f{s). If f{s) < oo, choose N such that f{s) < N. li n> N, then 

< f{s) - fn{s) < 2-'\ and thus |/(s) - /„(s)| < 2"". Thus /„(s) ^ f{s) in this case also. 

Finally, if / is bounded, i.e. f < N for some G N, then we see that \f{s) — fn{s)\ < 2"" 
for all n > A?^ and all s G 5, i.e. fn^f uniformly. 

(b) Now let / be an arbitrary measurable function to M. Then / is the difference of two 
non-negative measurable functions / = /"*" — /" (cf. Remarks 5.2.2), and thus, as in (a), 
there exist non-negative simple functions f^ifn such that fn ] f~^ , fn ^ f~ ■ Clearly then 
also (/+ — /~) — >■ /. Now note that if / is bounded, so are /+, /~. If the /+ and /~ converge 
uniformly, then also {fn ~ fn) ^ f uniformly. 

H 

Exercise 5.2.9 Suppose that is a measurable space, and that A = {An : n € N}) is a partition of 

n which generates i.e. o[A) = J-. (Recah that each clement of T is then a union of some of the A^s.) We 
show that the measurable functions are precisely those which are constant on the blocks An- 



88 



Lebesgue Measure 



(a) Show that if / : SI ^ E is jr/S(R)-measurable, then / is constant on each block An (i.e. if 
for some n € N, then /(wi) = /(w2). 

(b) Show, conversely, that if / is constant on each block, then / is J^/B(K)-measurable. 

□ 

5.3 Measures and cr— algebras from Measurable Functions 

The next proposition shows that moasiiros can bo pushed fonnard <\]ong moasnrablc fnnctions. 



Proposition 5.3.1 Suppose that (O, J^) (5, S) is measurable, and that ji is a (proba- 
bility) measure on Define a set function fJ,f~^ on S by 

(Mr^)(r)=/.(/-nT]) 

Then fJ-f~^ is a (probability) measure on {S,S). 

D_ 

Exercise 5.3.2 Prove Propn. 5.3.1. 

□ 

Remarks 5.3.3 If (fi, J^,P) ^ K is a random variable, then PX"^ is a probability measure on (K,S(K)), 
called the distribution or law of the random variable X. Note that 

(PX"^)S = P(X € B) 

□ 

Exercise 5.3.4 (a) Suppose that (n, T, P) is the die space, i.e. Q, = {1,2, . . . ,Q} , T = V{Q) and P(w) = | 
for all Lo & Q,. Define X : Q. ^ ^ : uj t-^ — ^. Show that X is J-"/jB(R)-measurable, and determine law 
of X, i.e. the measure PX"^ on (E,Z?(E)). 

(b) Suppose that F : K ^ K : x a;^. Show that F is a Borel function, and calculate AF^'^[— 1, 3] (where A 
is Lebesgue measure). 

□ 

A measure ^ on (R, 6(R)) is said to be locally finite iff ^{I) < oo for every compact interval 
I. The next theorem states that there is a one-to-one correspondence between locaUy finite 
measures and increasing right-continous functions. 

Theorem 5.3.5 (a) Suppose that F : M ^ M is a right- continuous increasing function 
with F{fd) = 0. There is a unique locally finite measure n on (M,;S(M)) with the 
property that 

fj,{a, b] = F{b) - F{a) - oo < a <b < oo 
The measure fx is called the Lebesgue-Stieltjes measure associated with F. 

(b) Conversely, given a locally finite measure fj, on (]R,;B(]R)), there is a unique right- 
continuous increasing function F with F{0) = so that 

F{b) - F{a) = ii{a, h] - oo < a <b < oo 



Measure Spaces 



89 



Exercise 5.3.6 We prove Thm. 5.3.5. 

(a) Suppose that F is right-continuous increasing with F{0) = 0. Define a function g : R — » R by 

g{t) = inf{s € R : F{s) >t} t £ R 

(Recall that inf := oo.) 

(a.l) Show that g{t) < x iS t < F{x), so that g is a generalized inverse of F. 
(a. 2) Show that g is increasing and left-continuous. 
(a.3) Explain why g is a Borel function. 

(a.4) Define a measure n on (R, B(R)) by /u := \g~^, where A is Lebesgue measure. Use (a.l) to show 
that /Lt(a, b] = F{b) — F{a) whenever — oo < a < 6 < oo. 

(a.5) Now prove the uniqueness of fj,: Explain why if v is any other measure on R that satisfies i>{a, b] = 
F{b) — F{a) for all — oo < o < 6 < oo, then u = fj,. 

(b) Suppose that /u is a locally finite measure on (R, jB(R). Define 

/ fi{0,x] ifx>0 
F(x) := < 

[ -H{x, 0] if a; < 

(b.l) Show that F is right-continuous increasing with F(0) = 0. 
(b.2) Show that /^(o, b] = F{b) — F{a) whenever — oo < a < b < oo. 
(b.3) Show that F is the unique function satisfying (b.l) and (b.2). 

We can also pull back cr-algebras along measurable functions. We already introduced the 
notion of a cr-algebra generated by a family of sets. We can use this to define the notion of 
a cr-algebra generated by a random variable. 



Definition and Proposition 5.3.7 (a) Let {S,S) be a measure space, and suppose that 
X is a collection of functions Q ^ S. There is a smallest a-algebra on denoted by 

is the such that all X E X are a{X)/S -measurable. (t{X) is called the a-algebra 
generated by X . 

We also write a{Xi : i £ I) for the a-algebra generated by the family X = {Xi : i G /}. 
(b) If X is a measurable function, then 

a{X) = {X-\T] -.TeS} 



Proof: (a) Let 

C = {X-\T] : X G X,T e S} 

Then C is a family of subsets of and clearly a{X) = a{C). (We already know what is meant 
by a{C), as C is a family of sets.) 

(b) By (a), a{X) is the smallest cr-algebra which includes the family C = {X~^[T] : T G 
S}. However, by Propn. 5.1.3, C is a cr-algebra, and thus C = a{X). 



90 



Lebesgue Measure 



5.4 Information 

In the probabilistic framework, cr-algebras play the role of carriers of information. 
Earlier, we saw that if is a probability space, then 

• T is the set containing all those events for which it can be decided whether or not they 
occurred. 

• If C is a family of events, then cr{C) is the set containing all those events for which it 
can be decided whether or not they occurred, given that we can decide all the events in 
C. 

For a random variable X on a probability space fi, the cr-algebra a{X) can be interpreted in 
two ways (which are two sides of the same coin) : 

(i) a{X) is the information carried by X: It is the set of all events that can be decided, 
given that we know value of X. 

(ii) It is the smallest amount of information that we need in order to know the value of X. 

Example 5.4.1 For example, consider the experiment of rolling a die, so that O, = {1,2, ...,6} and 
J- = ViQ). Let the random variable X : n ^ R be defined by 



Xioj) = 

It is easy to check that 



if w is even 

1 if w is odd 



a(X) = {0,{l,3,5},{2,4,6},n} 
Let's consider our interpretations (i) and (ii) above: 

(i) If we know the value of X, all we know is whether the outcome of rolling the die is an even number or 
an odd number, i.e. all we can decide is whether {2, 4, 6} or {1, 3, 5} occurred (in addition to being able 
to decide the certain and impossible events). 

(ii) To know the value of X, all we need to know is whether the outcome of the die roll was even or odd. We 
do not need to know the exact outcome of rolling the die. 

□ 

Example 5.4.2 ADD EXAMPLE ABOUT STOCK PRICE EVOLUTION, FILTRATION, 
ETC. 

□ 

Exercise 5.4.3 Suppose that X : (f2,.F) — » (R,B(R)) is a function, and that u;i,u!2 € O, are two elements 
with the following property: 

For all F €J^ we have oji € F -i^ uj2 € F 

Show that if X is .F-measurable, then X{u!i) = X{ll>2)- Thus if cannot distinguish between wi and u!2, 
neither can any .F-measurable random variable. 
[Hint: Define x := X{uji) and consider X~^{a;}.] 

□ 



Measure Spaces 



91 



If X,Y are random variables such that (t(Y) C (j{X), then the information needed to 
determine the value of y is a subset of the information required to determine the value of X. 
Hence, if we know the value of X, we should also know the value of Y. This suggests that Y 
is a function of X. The following theorem makes this precise. 

Theorem 5.4.4 (Doob-Dynkin Lemma) 

Suppose that Xi,Y : {^,J^ — > (M,B(M)) (i = l,...,n) are measurable. Then Y 
is a {Xi, Xn) -measurable iff there is a Borel function M" A M such that Y = 

h{Xi, . . . ,Xn)- 

Proof: (<^=): We first show that the map X : ^ ^ : u ^ (Xi(a;), . . . , is 
(t(Xi, . . . , X„)/i3(M"')-measurable. ByPropn. 5.1.11, it suffices to check that ]^^^-i^(—oo, q] G 
(t(Xi, . . . , Xn) for all (ci, . . . , c„) G M."", because the family of these lower orthants generates 
B{W). But 

n n 

X-' Hi-^, Ci] = f] Xr\-^,ci] 

i=l i=l 

SO this is obvious. Now h{Xi, . . . , Xn) = hoX is a composition of measurable functions, and 
hence measurable. 

(^): First assume that Y is simple, i.e. Y = Yl'j=iyj^Aj for some family of mutually 
disjoint sets Aj (cf. Propn. 5.2.7). Since Y is assumed to be a{Xi, . . . , X^)-measurable, we 
see that each Aj = Y^^lyj} belongs to (t{Xi, . . . , Xn)- Define X = {Xi, ... , Xn), as above. 
Reasoning as in Propn. 5.3.7, it is easy to see that 

A e a{Xi, ...,Xn) iff A = X-^B for some B G B{W) 

and thus Aj = X~^Bj for some Bj G ^(M"). Now define 

n 

Then h{Xi, . . . , Xn) = F, as required. 

Now assume that Y is an arbitrary (t{Xi, . . . , X„)-measurable random variable. Choose 

a sequence of simple random variables Yk {k G N) such that Yk Y pointwise (cf. Propn. 
5.2.8). Hence there exist Borel functions such that Yj. = fk{Xi, . . . , Xn). Let M = x G : 
{fk{x))k converges}. ThenM G B{W^) (e.g. M = ^"-'^{O}, where g = limsup^ /fc— liminffc /fc). 

Define M" ^ M by 

{lim/fc(x) if a; G M 
k 
else 

Then / = limj. fkiM is a Borel function. Now 

Y{uj) = liraYkioj) = lira fk{X,{u;), . . . ,X„(a;)) 
k k 

which implies two things: (i) (Xi(a;), . . . , X„(a;)) G M, and (ii) Y = f{Xi,...,Xn), as 
required. 



92 Lebesgue Measure 



Chapter 6 

Integration 



6.1 Definition and Basic Properties 

The aim of this section is to define the integral J / d/x of a measurable function / w.r.t. a 
measure Why do we want this? Because 



Expectation = Integration 

Throughout this subsection, let (5,(S,/x) be a measure space, and let mS be the set of 
all measurable functions from (S, S) to M. We will define a (partial) linear functional, also 
denoted by f^, or by J ■ dii, from mS to M, i.e. 




f ^ l^f = / f djJ, 



The quantity J f dji need not exist for every measurable function /. If it does exist, we 
say that / is integrable. 

For the map /x : mS — > M to be an integral, we would like it to satisfy the following 

properties: 



I. J Ia dfj, = fj,A, i.e. /j^Ia = fJ-A, for every A E S. 
II. (Linearity) J^af + Pg = a j f diJ, + (3 J^g d/j, 

III. (Monotonicity) li f < g then J f dfi < J g d^ 

IV. (Continuity) Suppose that /„ f. Then j fn dfi ^ j f dfi. 
[Actually, we won't quite get this property, but a weaker one.] 



Note that (I.) states that the integral /j, is, in some sense, an extension of the measure /j,: 
Every measurable set can be identified with a measurable function (the set A is identified 
with the indicator function I a)- The integral J f dfj, = jj,/ can be thought of as extending 
the measure n from sets to functions. 

The definition of the integral proceeds in three steps: 

(i) Define the integral on the set sS~^ of non-negative simple functions . 



93 



94 



Definition and Basic Properties 



(ii) Extend the definition to the set m»S+ of all non-negative measurable functions. 

(iii) Finally extend the definition to the set mS of measurable functions. 

If is a non-negative simple function, there is only one way to define the integral to be 
consistent with (I.) and (II.): If <^ = J2k=i ^^klA^j then define 

/n 
k=i 

Some things need checking: 



Proposition 6.1.1 (a) The definition of fj,(p doesn't depend on the representation of f 
as a linear combination of indicators, i.e. if ip = Ylk^k^^k ~ Ylj^j^Bj, then 

(h) jilA = fJ-A. 

(c) If ifjip E sS'^ and a, (3 > 0, then iJ,{a(p + Pip) = a jjnp + /? /x'^. 

(d) if If < ip £ sS^ , then < flip. 

Proof: We may assume that the aj. are all distinct from each other, and that the bj are all 
distinct from each other. Thus Ak,Bj G cr{if), the a-algebra generated by (p. A little hows 
that there is a representation ip = Ylm ^mlCm V such that {Cm)m forms a partition of S, 
and such that each A^ and Bj is a union of some C^'s.- just let the Cm's be the blocks of the 
partition that generates cr((/?). In particular, for each k, m, either Ai^ D Cm = 0, or Cm Q A}-. 
A similar statement holds for the Bj. Also = Ylki.'^k '■ Cm C Ak}. 

(a) We have 

XI l^^k = X X l^^^>' ^'"^ = X X ^kl^i^k n Cm) 
k km m k 

= X X"L"*; I^^^ ■ - ^f'} = ^Crn nCm 

m k k 

(b) is obvious. 

(c) Suppose that (p = = Ylj ^j^Bj, where {Ak)k, {Bj)j are partitions of S. Then 
ip + il^ = Y.k,ji^k + bj)lAknBj and hence 

lj,{ip + V') = ^{a-k + bj)n{Ak n Bj) = ^ak l^Ak + ^ bj l^Bj 

k,j k j 

(d) is obvious. 

H 

If / is a non-negative measurable function, then (III.) requires that we must have J f d/j, > 
J (p djjL whenever (p is simple, with f > ip. We also know that there is a sequence ipn of 
simple non-negative functions such that (pn T /• (IV.) dictates that we should then have 
J ipn dfi ^ J f dfi, and (III.) that lim^ J ipn dji = sup„ / ipn dji. The most parsimonious 
choice, therefore, is to define: 



The Integral 



95 



Definition 6.1.2 

J f dfj, = sup I J (p dfj, : f > (p E mS~^ | 

Nolc lhal /// may l^c cciual Lo +dc. 

Exercise 6.1.3 (a) If (p — X]*;"*: M^fc is non-negative simple, it is also non-negative measurable, and 
thus we now have two definitions of flip namely 

ixifi = Ofc nAk and inp = sup{iJ,ip : ip > ip € sS~^} 
k 

Show that these two values of fiip coincide, 
(b) Verify (III.) for non-negative measurable functions, i.e. show that if / < g £ m.S"'", then nf < jig. 

□ 

Proving that the integral is still linear, i.e. that (II.) holds is much more difficult, and 
requires a version of (IV.) In fact, a weak version of (IV.) forms the foundation for the whole 
edifice of integration theory: 

Theorem 6.1.4 (Monotone Convergence Theorem) 
Suppose that fn, f G m5+ such that fn t /. Then /ifn T ///. 

Proof: It is easy to see that {nfn)n is an increasing sequence, and that each < /if, so 
that limnnfn exists (in the extended reals) and lim„/i/„ < /n/. 

Let f > (p E SiS"*", and suppose (p = aklA^^ where the A}- are disjoint, and each > 0. 
For e > 0, define 

= ~ £)afe^Afcn{/„>(l-£)afc} 

k 

Then ipn is a non-negative simple measurable function with ipn < fn- Hence 

/^/n > l^^n = (1 - e) XI ^klJ'iAk n {fn > (1 - e)afc}) 

k 

Note also that 

Ak n {fn > (1 - e)ak} T Ak 

for if s G Ak, then Ofc = /(s) = lim„ /n(s), so that /„(s) > (1 — e)ak if n is sufficiently large, 
and thus s & Ak Ci {fn > (1 — £)«*;} if n is sufficiently large. By continuity properties of 
measures, 

fi{Ak n {/„ > (1 - e)ak}) T fJ'Ak as n ^ oo 

which in turn yields 

liipn T (1 - e) X Ofc Mfc = (1 - e) A*'/' as n ^ oo 

k 

Now fifn > iJ-ipn for each n G N, and thus 

lim/x/„ > (1 - e) nip 

n 

This is true for any non-negative simple <p < f and any e > 0. Taking the supremum over 
those ip, we see that 

lim/x/n > (1 - e) sup{/x(/? ■ f > ^p & s<S+} = (!-£) /x/ 

Letting e — > 0, we conclude that lim^ fj,fn > jJ^f - 



96 



Definition and Basic Properties 



H 

In Propn. 6.1.1(b), Exercise 6.1.3(b) and Thm 6.1.4, we have seen that (I.), (III.) and a weak 
version of (IV.) hold. Wc have also verified (II.) for non-negative simple functions (cf. Propn. 
6.1.1(c)). Now we can verify that (II.) holds for non-negative measurable functions: 



Proposition 6.1.5 If f,g & mS^ and if a,/3 > 0, then fi{af + Pg) = a fif + (3 fig. 

Proof: Choose sequences {ipn)n, {'4'm)m of non-negative simple functions such that ipn t /, 
il)n t g- Then each aipn + /3'0n is non-negative simple, and (a(^„ + /3^n) T («/ + f^g)- Since 
(II.) holds for simple functions, and by the Monotone Convergence Theorem, we see that 

li{af + I3g) = lim /x(q;(^„ + /3^„) = a lim fnpn + (3 lim yuV'n = Oi nf + 13 ng 

n n n 

H 

It remains to define the integral for arbitrary measurable functions. Recall that if / G niS, 
then f = f+ - f-, where /+ = / V 0, /" = -/ A = (-/) V 0. Since /+, /" G raS+, the 
integrals /j.f'^jiJ.f' have already been defined. If we want to preserve linearity, we therefore 
must define fxf by 

I fdfi = I f+dfi- I f-dfi 

However, here we face a problem: If both nf^, ij,f~ are equal to +oo, we have /// = oo — oo, 
an indeterminate form. 

Definition and Proposition 6.1.6 A function f G mS is said to be /i-integrable iff 
fi\f \ < oo. The class of all fi-integrable functions is denoted by JC^{S,S, /j,). 
If f & C^{S,S, fi), we define 

IJ'f = l^f^ - l^f~ 

Then f is integrable iff fif'^ , lJ'f~ < oo. 
Proof: Note that |/| = /"*" + /~, so is finite iff both ijLf~^,jj,f~ are finite. 

H 



Definition 6.1.7 If / is an integrable function and A is a measurable set, we define 

jjdfx:= J flA dii =: n{f; A) 
to be the integral of / over the set A. 



Remarks 6.1.8 Later, we will prove the following important fact: If the Riemann integral f{x) dx of 
a function K ^ K exists, then 

' fix) dx= f fd\ 

J[a,b\ 

where A is Lebesgue measure on (R, B(R)). If the Riemann integral of a function exists, then so does the 
Lebesgue integral, and the two integrals coincide. This is obvious if / is a simple function, as you can easily 
check, but the proof for general / is deferred to a later subsection. Note that the Lebesgue integral may exist 
even when the Riemann integral does not. 



The Integral 



97 



□ 

Obvious, but often useful, are the following facts: 
Proposition 6.1.9 (a) If f is measurable, then f is integrable iff\f \ is integrable. 

(b) If f,g are measurable, g is integrable and |/| < g, then f is integrable as well. 

(c) If f is integrable, then fx{f = ±00} = 0, i.e. f is finite /i-a.e. 

(d) If f is integrable, then \ J f dfj,\ < J \ f\ dfj,. 

Proof: (a) is obvious, (b) follows from the fact that //|/| < ng < 00 (because we have (III.), 
monotonicity, for non-negative measurable functions). 

(c) Let A = {s £ S : \ f{s)\ = 00}. Then uIa < \ f\ for all n G N, and hence n fj,A < 
Letting n — > 00, we see that we must have = +00 if /lA > 0. 

(d) follows because \fif\< \l^f^ \ + |a*/~| = 

H 

Rertlcirks 6.1.10 Above, we have defined the integral /x/ only for / € C^{S,S,i-i), with the result that 

—00 < fif < 00, i.e. fj,f is a finite number.. Before, wc defined the integral for arbitrary g G m<S^, but might 
then have fig = +00. The restriction to £^ is to prevent having to deal with the indeterminate form <x — 00. 
However, 00 — c and c — 00 are perfectly fine if c 7^ cx). So we sometimes can define the integral of a measurable 
function / in an extended sense: If /i/"*" = 00, but = c < 00, then we say that /i/ = 00, for example. 
Nevertheless, such an / is not integrable. 

□ 

Exercise 6.1.11 The decomposition / = /+ — /~ is but one of many ways that / can be decomposed 

as a difference of non-negative measurable functions. Show that if f — g — h is a difference of non-negative 
functions, then /if — fig fih. Thus the definition of the integral of f is independent of the representation of 
f as a difference of non-negative measurable functions. 
[Hint: Apply Propn. 6.1.5 to f+ + h = g + f~ .] 

□ 

Looking at our wish list of properties, i.e. (L)-(IV.), we see that (L) holds automatically. (IIL) 
(monotonicity) is easy: li f < g, then < g^ and > g , so /Li/"'" < iig^ and ///~ > ng~ 
(because (IIL) holds for non-negative measurable functions, cf. Exercise 6.L3(b)), and hence 

We finish this subsection by dealing with (II.) (linearity), and leave (IV.) (continuity) to 
the next section. 

Theorem 6.1.12 If f, g e C^{S,S, i^) , anda,peR, then 

ti{af + I3g) = a jif + (5 ng 

Proof: It suffices to prove that /x(/ + g) = ^f + jig and that fi{af) = a nf (for f,ge£.^ 
and a G M). Now 

/+5 = (/+ + 5+)-(r + ff-) 

is a representation of / + 5 as a difference of non-negative measurable functions. By Exercise 
6.L11, it follows that fi{f + g) = m(/"'" + 5"*") ~ Kf~ + 9~)- Propn. 6.L5 implies that 
Kf + 9) = 1^1 + fj-a- 

Similarly, an application of Propn. 6.1.5 and Exercise 6.1.11 to af = af^ — af~ (if 
a > 0), or af = {—a)f~ — (—«)/+ (if a < 0) yields the conclusion that lJi{af ) = a ///. 



98 



DDominated Convergence Theorem 



H 

Exercise 6.1.13 Show that C^{S,S,fx) is a vector space. Also give an example to show that it may not 
be closed under multiplication. 

□ 

6.2 Lebesgue's Dominated Convergence Theorem 

. . . the Swiss Army Knife of probability theory. . . 

The following proposition serves as stepping stone in the proof of the Dominated Conver- 
gence theorem, but is also very useful in other situations. 



Proposition 6.2.1 (a) FATOU'S LEMMA: If fn e mS+ for n G N, then 

/x(liminf /„) < liminf 

n n 

(b) REVERSE FATOU LEMMA: Suppose that fn e m5+ for n G N, and that there exists 
a g E C^{S,S,fj,) such that each fn < g- Then 

limsup iifn < //(limsup/„) 



Proof: (a) Let / = liminf„ fn, and define gn = inf^>„ Then gn T /> and so the Monotone 
Convergence Theorem implies that fign T f^f- Moreover, fign < 'va.im>n i^fm (by monotonicity, 
(in.)), and so iif = \\m.niign < lim„ inf ///^ = liminf„///„. 

H 

Exercise 6.2.2 Prove the Reverse Fatou Lemma by applying Fatou's Lemma to the sequence g — fn- Why 

do we require that g € jC^? Cancellation! 

□ 

Remarks 6.2.3 Under suitable conditions, we sec that we have 

H lim inf /„ < lim inf nf„ < lim sup fj,fn < M lim sup /„ 

^ ^ n n 

This provides a useful mnemonic: The terms with the limits on the outside (of the integral) are on the inside 
(of the string of inequalities). 

The mnemonic Terms with limits on the inside are on the outside also works. 

□ 



Theorem 6.2.4 (Dominated Convergence Theorem) 

Suppose that fi, f2, /s, • • • is a sequence of measurable functions on {S, S, n) such that 
lim„ /„(s) exists for all s G S; 

(i) There is a g £ C^{S,S,iJ,) such that < g for all n G N. 

Then the function f = lim„ /„ is in C^{S,S, ji), and 

Hf = lim nfn 



The Integral 



99 



Proof: Since < g, the functions 5 ± /n are non-negative measurable functions, and thus 
by Fatou's lemma, we see that 

fig + liminf(±/x/„) = liminf /^{g ± /„) > j[x(liminf(5 ± /„)) = ij.{g + f) = jig ± fif 

n n n 

Subtracting /ig < 00 from both sides, we see that lim inf„ > /if and that lim infn(— /x/n) > 
— /x/, and thus that limsup^/i/n < Combining, we obtain 

IJ,f < liminf < limsup/Li/„ < /// 
» n 

H 

Remarks 6.2.5 Note that the DOT states that if a sequence of measurable functions is dominated by an 
integrable function, then hmit and integral can be interchanged, i.e. 

//(lim/„) = lim/i/„ 

n n 

The integral of the limit is the limit of the integrals. 

□ 

Exercise 6.2.6 (1.) Let /„ = ^/[o,n] for n € N. 

(a) Show that /„ — > as n — > +00. 

(b) Show that / /„ dA = 1 for all n e N. 

(c) Why docs this not contradict 

(i) the Monotone Convergence Theorem? 

(ii) Fatou's Lemma? 

(iii) the Lebesgue Dominated Convergence Theorem? 
(2.) Let /„ = '^-^(o,i] for n€N. 

(a) Find the function lim /„. 

n — ^+00 

(b) Show that lim J d\ ^ J lim /„ dA. 

n—*-{-oo n— »+oo 

(c) Why does this not contradict the Lebesgue Dominated Convergence Theorem? 

□ 

Remarks 6.2.7 Recall Remarks 6.2.3: The integrals J f{x) dx and J f dX coincide whenever the former 
exists. 

An oft-used fact in calculus is that ^ f{x,t) dx = ^f{x,t) dx, provided that ^ is bounded — 
differentiation under the integral sign. This can be justified via the DCT. 

Let G{t) := f{x,t) fi{dx). We want to show that G'{to) — ^^{x.to) n{dx), under certain commonly 
satisfied conditions: Suppose that there exists a /x-integrable function M{x) such that |§[(a:,t)| < M{x) for 
all a;, and all t € {to — 5, to + 5) (where 5 > 0). Let {hn)n be a non-zero sequence of reals such that hn — > 0, 
and such that each < 5. Then 

G'ito) = lim Gito + h)-Gito) ^ r fjx^to + h) - nx,to) ^^^^^1 ^ ^.^ f ^^^^^ ^^^^^ 

where gn{x) := ■^(^■*o+''n)-/(3:,to) ^ jvjo^^g gn{x) —^ '^{x,to). We claim that the sequence g„ is dominated 
by M. Indeed, by the Mean Value Theorem, there is, for each x and each n € N, a € {to — \hn\,to + 
\hn\) Q {to — S,ta + 5) such that gn{x) = j^{x,t"), and thus |5n(ic)| < M{x). Since M is /i-integrable, 
lim„ J gn{x) n{dx) = J limn gn{x) fJ'{dx), and we are done. 



□ 



100 



Measure Zero 



6.3 Measure Zero 

Suppose that /x) is a measure space. It may be possible to extend the measure /x to a 

class of sets larger than JT, where the measure of the added new sets is determined by /x and 
!F. For example, suppose that 

(i) F is such that nF = 0; 

(ii) ACF 

then "clearly" fiA = also. However, if ^ ^ .7^, then fiA isn't defined. Yet, fiA "ought" to 
be zero. By adding all those sets whose measure "ought" to be zero, we get a new a-algebra 
called the completion of J- w.r.t fi. 

Definition 6.3.1 Let {Cl,J^,iJ,) be a measure space, and let ACCl. 

(a) We say that A is ji-null if there exists B such that A C B and iiB = 0. 
(It is not necessary that A e J^.) 

(b) The measure space (Jl,^, /x) is said to be complete iff every /x-nuU set is measurable, 
i.e. belongs to J^. 

Exercise 6.3.2 Show that a countable union of /i-nuU sets is /i-null, i.e. that if Nn are /i-null sets, for 
n e N, then Nn is also a //-null set. 

□ 

Definition and Proposition 6.3.3 Let {Cl,J^,iJ,) be a measure space. Let 

M := {N cn:3F e r[iiF = A C F]} 
he the set of fj,-null sets (cf. Definition 6.3.1). 
(a) The family of sets 

f := {F U N : F e J^,N e Af]} 
is a a-algebra, called the completion of w.r.t. ji. 

(h) We have 

Gef' iff there are Fi,F2 G J=' such that Fi C G C F2 and /x(Fi) = /^(Fs) 

(c) We can extend the measure fi to a measure p, on the a-algebra T in the obvious way: 

IfG = FUN, where FeT,N eM, define Ji{G) := n{F) 

(d) The space /x) is complete. 

Proof: (a) We first show that is a c-algebra. That T is closed under countable unions 
follows straightforwardly from the fact that both and N are closed under countable unions. 



The Integral 



101 



To check that is closed under complementation, suppose that FUN G where F E J-, N E 
M. Choose G G such that /xG = and iV C G. Then 

{F U Ny = {F\J Gf U [G - (F U A^)] 

Now {FUGY e J^, and G- (FUiV) G (being a subset of G). Hence (FU A^)^ G JF, proving 
that ^ is a cr-algebra. Clearly = a{J^UM). 

(b) If Fi, F2 G are such that ^Fi = fiF2, and if Fi C G C F2, then G = Fi U (G - Fi), 
where (G - Fi) C (F2 - Fi), so that G - Fi G AA. It follows that G € f. 

Next, if G G J^, then (by definition of J^) there is Fi G J^, iV G A/" such that G = Fi n A^. 
Also, there is F E such that /xF = and G Q F. If we now define and F2 := Fi U F, we 
see that Fi, F2 G ^, with Fi C G C F2, and /xFi = juF2. Thus 

f={Gcn: 3Fi, F2 G ^(Fi C G C F2 A ;u(F2 - Fi) = 0)} 

(c) Wc need to verify two things: That the extension /2 of ^ is well-defined on ^, and 
that it is a measure. To see that it is well-defined, suppose that G = Fi U A^i = F2 U A^2 are 
two representations of G, where Fi,F2 G J^, Ni,N2 G J\f. We must show that //Fi = fiF2. 
But Fi = Fi n (Fi U A^i) = Fi n (F2 U A^2) = {Fi n F2) U (Fi n A^2)- It follows easily that 
/xFi = |u(Fi n F2). Similarly /iF2 = /i(Fi n F2), and hence /iFi = //(Fi n F2) = /iF2. 

Next, we show that p, is a measure on ^: Suppose that G„, (n G N) are mutually disjoint 
members of JF. Choose Fn E T,Nn E J\f such that G„ = F„ U Nn (for n G N). Then 
IJ„G„ = F U A^, where F := Un-^n ^ and A?" := IJn-^n € (because a countable 
union of ^u-nuU sets is /i-nuU). Then by definition of the extension of /i on ^, wc have 
p, (J^ Gn = jiF = ji (J^j Fn = /uF„ = /iGn, where we used the fact that /i is a measure 
on to deduce that ji |J^ F„ = ^F„. 

(d) Suppose that A/" is a null set for (Jl, :F, /i). Then there exists G eT such that N C G 
and such that /i(G) = 0. There exist therefore a F G .F G C F and fiF = 0. Putting all this 
together, we see that N C F and jiF = 0, and thus that N E N Q T. Thus every /i-null set 
belongs to JF, as required. 

H 

Exercise 6.3.4 Show that if {S,J^,n) has completion {S,f',ii), then 

f' = {ACS : AAF is a /x-null set, for some F € J^} 

[Hint: Let Af be the family of null sots, let G ^ {A C S : AAF € JV for some F € J^}, and let T = a(J^U A/") = 
{FUN : F £ J^,N £j\f]}. First show that .F, A/" C and conclude that .F C Next, note that if AAF € Af 
for some F £ .F, then A = {F - {F - A)) U {A - F), where F e and F - yl, A - F e AA. 

□ 



Definition 6.3.5 We shall say that a statement ^ holds ^-almost everywhere (or /i- 
almost surely if is a probability measure), if the set {w G O : $(u;) is not true } where 
$ fails to hold to hold is a /U-null set. 



First note that completing a measure space does not create any interesting new measurable 
functions: 



102 Measure Zero 



Proposition 6.3.6 Let {S,Si^,fx) be the completion of {S,S, fj,). Then a function S — M 
is S'^ -measurable iff there is an S-measurable function S M such that f = g /x-a.e. 

Proof: (=>): First suppose that / = /a is an indicator function. By Exercise 6.3.4, we know 

that Sf" = {A C S : 3F e S{AAF is /i-nuU)}. So if I a is <S^-measurable, then Ia = If 
fi-a.e. for some F & S, where then Ip is 5-mcasurable. 

It is now straightforward to see that the proposition holds for simple functions as well. 

If / is an arbitrary »S'*-measurable function, we may choose a sequence /„ of simple S^^- 
measurable functions such that /„ f. Then choose simple 5-measurable functions gn such 
that fn = gn /U-a.e., for all n £ N. Let g = limsup„5'„. Then f = g fj,-&.e. (because 
{s e S : f{s) / g{s)} C [J^{s G 5 : fn{s) / gn{s)}, a countable union of null sets). 

Suppose that f = g n-s,.e. for some »S-measurable g. If B is a Borel set, then 
f-\B)Ag^^{B) C {s e S : f{s) 7^ g{s)} is a /x-null set. Since g~^{B) G S, we see that 
f-\B) G SI" (by Exercise 6.3.4) 

H 

Remarks 6.3.7 If / is >S-measurable and ii f = g /x-a.e., we cannot generally conclude that g is also 
<S-measurable. That conclusion is valid, however, if <S is complete w.r.t. //, i.e. if <S = <S''. 

□ 

Next note that two functions which are equal /x-a.e. have the same integrals. 

Lemma 6.3.8 On {S,S,fi), if h > is measurable, then /ih = iffh = ii-a.e. 

Proof: The statement is obviously true if h is simple non-negative. For general h £ mS^, 
choose simple hn such that < /i„ t /i. If /u/t = 0, then by the MCT, < fihn < fih = 0, 
so that, by the above, /i„ = //-a.e. Thus also h = lim„ hn = fi-a.e. Conversely, if ^ = 
jLt-a.e., then also hn = /x-a.e., and hence fih = lim„ fj,hn = 0, by the MCT. 

H 

Theorem 6.3.9 On {S,S,fi), if f,g are measurable functions such that f = g fi-a.e., and 
if f is integrable (in the extended sense), then g is integrable (in the extended sense), and 
fJ'f = ^J'9■ 

Proof: We have < |/// — iig\ < — g\, by Propn. 6.1.9. But f = g ji-a.e. iS \ f — g 
/x-a.e., so Lemma 6.3.8 shows that < |/// — i2g\ < 0. 

We can use this to improve the convergence theorems. For example: 

Theorem 6.3.10 (Dominated Convergence Theorem) 
Suppose that fi, f2, fs, ■ ■ ■ is a sequence of measurable functions on a complete measure 
space {S,S,fx) such that 

(i) lim„/„(s) exists for fj,-Si.e. sES; 

(ii) There is a g £ C^(S,S,fi) such that < g fi-a.e. for all n G N. 

Define f by f{s) = lim„ /„(s) if this limit exists, and let f{s) be arbitrary otherwise. Then 
f G C^{S, S, 11), and 

II f = lim 11 fn 




The Integral 



103 



Proof: Let 

N = {s e S : lim/„(s) does not exist} U {s G 5 : \ fn{s)\ > g{s)} 

n 

Then is a null set, and thus in S (because the measure space is assumed complete). Define 

fn = fn.lN'^ 9 = f = flm 

These functions are also iS-measurable, and we have fig = fj,g < oo, and 

lim Us) = f{s) I /„ (s) I < g{s) for all s e 5 

n 

By Theorem 6.2.4, we can conclude that / is integrable, and that fj,f = lim„ /Lt/„. But fif = fif 
and nfn = 11 fn, by Theorem 6.3.9. 

H 

6.4 Chain Rule, Change of Variables 

Here is another way of obtaining new measures from old: 

f - - 

Definition and Proposition 6.4.1 Suppose that {S,S,iJ,) {R,B{R)) is a non- 
negative measurable function. Define a set mapping / • /it : <S — > M by 

if-fi)A:= [ fdfi = fiiflA) 
J A 

Then u = f ■ jj, is a measure on {S, S). 

f is called the //-density ofu, and also written as the Radon-Nikodym derivative f = j^- 

Proof: We need only check that / • /x is countably additive. Suppose that A = (J^ An is a 
union of a family of mutually disjoint members of S. Put fn = J2k<n f-^^k- Then /„ | //^, 
and so ///n T m(/^a) = (/ ■ fJ')A, by the MOT. But nfn = T,k<n f^O^Ak) T,k<nif ' and 
thus iJifn T ■ (as n oo). We conclude that (/ • iJ,)A = J2kif " At)^ik- 

H 

The following proposition explains the notation 
Proposition 6.4.2 (Chain Rule) 

On {S,S,iJ,) if S ^ M"*" and S ^M. are measurable, then 

fj-ifg) = if ■ ij)9 
i.e. if v = {f ■ ji) (so that f = then 

J fgdii = J g^dn = J gdi' 

whenever one of these sides exists (in which case the other side exists as well, and the two 
sides are equal.) 



104 



Riemann Integral vs. Lebesgue Integral 



Proof: If g = Ia is an indicator function, then n(flyi) = (/ • fi)lA, by definition of / • ^u. If 
9 = Efc<„afc^Afc is simple, then nUT.k<n'^klAk) = ^k<n(^kf^if^Ak) = ^k<nMf ' f^)lAk = 
if ' l^){Ylk<n'^klAi.), by linearity of the integral. So the result holds for simple g. 

If 5 is a non-negative measurable function, we may choose simple T (cf. Propn. 
5.2.8). Then by the MCT, ^{fg) = lim„^(/5„) = lim„(/ • i^)gn = (/ • fi)g. 

Finally, if g is an arbitrary measurable function, then i-t\fg\ = fi{f\g\) = (/ • since 
/, \g\ are non-negative. Hence n{fg) exists iff {f-lj)g exists (by Propn. 6.1.9). Now split g into 
its positive and negative parts to see that n{fg) = fi{fg'^) — IJ'{fg~) = (/ • m)5^ ~ (/ ' A*)5~ = 
if ■1^)9- 

H 

Remarks 6.4.3 The above proof illustrates a useful technique, which David Williams^ calls the standard 
machine. To prove something holds for all integrals of a certain type: 

• First show that it holds for indicator functions; 

• Use linearity to show that it holds for simple non-negative functions; 

• Then use the MCT to lift the result to non-negative measurable functions; 

• And finally split an arbitrary measurable / into its positive and negative parts, and use linearity once 
again. 

□ 

Recall from Propn. 5.3.1 that if {S,S,ii) — {T,T) is measurable, then the map 

nf-':T^R:B^fif-'[B] 

defines a measure on {T,T). Also if {T,T) (]R,H(]R)) is measurable, then so is {S,S) ^ 
(t,H(t)) 

The next propn. shows that the integrals J g o f dfi and J g d{iJ.f~^) are equal: 
Proposition 6.4.4 (Change of Variables) 

Given a measure space {S,S,iJ,), a measurable space (T, T) and two measurable maps 
/ : iS — >■ r and ^ : T — > then 

1^(9 ° f) = il^r^)9 «-e. J 9 ° f diJ, = J g d{iif~^) 

whenever one of these sides exists (in which case the other side exists as well, and the two 
sides are equal.) 

□ 



Exercise 6.4.5 Prove Propn. 6.4.4. 

[Hint: Use the standard machine. For arbitrary measurable g, observe that fi\g o /| = Mdfll ° /) = (a*/~^)Is|; 
because \g\ is non-negative. This proves that go f is /x-integrable iff g is /x/~'^-integrable (i.e. one side exists 
iff the other exists.) ] 

□ 



^cf. his excellent (and short) book Probability with Martingales. 



The Integral 



105 



6.5 Riemann Integral vs. Lebesgue Integral 

Let / be a real-valued function defined and bounded on an interval [a, b] . We recall here the 
definition of the Riemann integral 



/ 

•J a 



m dt 

J a 

Dchnc a function / : [a, 5] ^ M by 

m= f f{s)ds 

J a 

Let 

P = {a = tQ<ti<t2<-- - <tn = b} 
be a partition of [a, 6]. Choose t% G [tjt-ijtjfc], Then we ought to have 



j-b n 

/ fdG = I{h) = Y,{I{tk)-Iitk-i)) 
k=l 

n 



k=l 

The sum in the last line of this equation is called a Riemann sum. This approximation ought 
to hold (for a sufficiently nice integrand /) provided that the partition P is sufficiently fine, 
and the smaller the sizes of the A^t = tk — tfc-i, the better the approximation. Let 

||P|| = maxjAfet : k + 1, . . . ,n} 

and let n{P) = n (i.e. number of points +1 in P). Define upper and lower Riemann sums as 
follows: 

n 

U{f,P) = ^sup{/(t) : t € [tk,tk-i]} ■ [tk - tk-i] 

k=l 

n 

L{f,P) = ^inf{/(i) : t e [tk,tk-i]} ■ [tk-tk-i] 
k=l 

It is obvious that we always have L{f, P) < U{f, P). 

Next define the upper and lower Riemann integrals by: 

/ f dt:= inf {[/(/, P) : P is a partition of [a, b]} 

J a 

rb 

/ f dt := sup{L(/, P) : P is a partition of [a, b]} 

It is easily seen that 

rb -jb 
/ fdt< fdt 

A function / is said to be Riemann integrable over [a, b] provided that the upper and lower 
integrals are equal. In that case the Riemann integral is defined to be their common value: 



rb rb p ^ 

/ fdG:= / fdG= / / 

J a J n J a 



dO 



106 



Riemann Integral vs. Lebesgue Integral 



Theorem 6.5.1 Let f be a bounded real-valued function on the compact interval [a,bj. 
Then 

(a) f is Riemann integrable iff f is continuous A-a.e. 

(b) If f is Riemann integrable, then f is Lebesgue integrable, and the integrals are equal: 

Proof: Assume that / is Riemann integrable. Then we can choose a sequence Pn of suc- 
cessively finer partitions of [a,b] such that U{f,Pn) — L{f,Pn) < ^. Define functions gn,hn 
on [a,b] as follows: For each n, gn{(i) = hn{a) = f{a). If Pn = {a = tQ < ti < t2 < • • • < 
^m„ — then gn,hn are step functions, with steps determined by Pn, defined as follows: If 
t G [a, b], then t G t"^] for some k, and we define 

gn{t) = M{f{x) : tl_, <x<tl} hn{t) = inf{/(x) : tl_, < x < tl} 

Then gn, hn are clearly simple Borel functions, designed so that 

/ gndX = L{f,Pn) [ hndX=U{f,Pn) 

J[a,b] J[a,b] 

Moreover is a bounded increasing sequence, with gn < /, and {hn)n is a bounded 

decreasing sequence, with hn > f- Define g = lim.ngn,h = lim„/i„. Then g,h are Borel 
functions, and by the DCT we have Jj^ ^g d\ = lim„ L{f, Pn) = f dt and Jj^ dX = 

limn U (/, Pn) = f dt. Hence h-gdX = 0. 

Now since h > g, Lemma 6.3.8 implies that h = g A-a.e. on [a,b]. Since g < f < h, we 
must have g = f = h A-a.e., and thus J f dX = J g dX = lim„ L(/,P„) = Jg^ f dX. This 
proves (b). 

Next, note that ift ^ Pn, and if h{t) = g{t), then / is necessarily continuous at t: For 
then g{t) = f{t) = h{t), i.e. 

liminf{/(x) : tl^_, <x< t^J = f{t) = supinf{/(x) : tl^_^ <x< tlj 
" n 

(where kn is such that t G (t^ -i-'^t ]) thus all values of f{x) must lie close to f{t) if x 
is close to t. Hence any discontinuity of / must belong to Un ^ {t : g{t) ^ h(t)}, a set 
of A-measure zero. This shows that if / is Riemann integrable, the / is continuous A-a.e., 
proving one direction of (a). 

Conversely, suppose that / is continuous A-a.e. Let P„ be a partition of [a, b] that divides 
it into 2^ subintervals of equal length, and construct simple Borel functions g^, hn as above. 
If / is continuous at t, then obviously lim.ngn{t) = f{t) = lim„ hn{t). Hence lim„(/i„ — ^r^) = 
A-a.e. By the DCT, we see that = lim„ /j^ /i„ - gi„ dA = lim„(;7 (/, Pn) - L{f, Pn)), from 
which Riemann integr ability easily follows. 



Chapter 7 

Differentiation 



7.1 Bounded Linear Operators 

As preliminary to the definition of tlie derivative of a general multivariate vector-valued 
function / : M" — we discuss continuity of linear operators. We start with a simple 
criterion for continuity: A linear operator L : V ^ W between normed vector spaces is said 
to be bounded if and only if there is a constant C such that ||L(a;)|| < C||a;|| for all x & V. 
The next proposition shows that a linear operator is continuous if and only if it is bounded: 

Proposition 7.1.1 Let L : V ^ W be a linear operator between normed vector spaces 
V, W. Then L is continuous if and only if it is a bounded operator, i.e. iff there exists a 
constant C such that 

P (■■'■•) 1 1 < !<>'■ e 

Proof: Suppose L is continuous. Choose 5 so that ||L(x)|| < 1 whenever < 6. Let 
C > |. If X G y, then ll^ll < 6, so < 1, i.e. \\L{x)\\ < C\\x\\. 

Conversely, suppose that L is a bounded operator, and that ||L(x)|| < C||x|| for all x eV. 
To show that L is continuous, it suffices to show that L{xn) L{x) whenever Xn — x, i.e. 
that ||L(x„) — — > whenever \\xn — x|| ^ 0. But this is easy: 

\\L{xn) - L{x)\\ = \\L{xn - < C\\xn - x|| ^ as \\xn -x\\^Q 

H 

Before we prove that linear operators between finite-dimensional vector spaces are con- 
tinuous, we need a simple Lemma: 

Lemma 7.1.2 // L : ^ is a linear transformation, then L is bounded, i.e. there 
exists a constant C such that 

||L(x)|| < C ||x|| /ora//xGM" 

(where the norms are the standard Euclidean norms.) 

Proof: Let ei, . . . , e„ denote the standard basis of M", and put C = nmax{| |L(ei)| |, . . . , ||-L(e, 
so that each ||L(ei)|| < ^. If /i = {hi, . . . jhnY^ G M", then h = Yll=i^i^i^ ^-^d each 



107 



108 



The Derivative 



\hi\ = \Jhf < ^/h1 + ■ ■ ■ + h'^ = ||^||. It follows, using the triangle inequality and the in- 
equalities just obtained that 

= \\L{j2 h^e^)\\ < %\ ||L(e,)|| <Y.m\'^ = C \\h\\ 

i=\ i=l i=l 

H 

Proposition 7.1.1 and Lemma 7.1.2 immediately imply that: 



Corollary 7.1.3 Aruj linear operalor Z : M" — W" is coriiniiioiis. 



Remarks 7. 1.4 1. Suppose that V, W are normed vector spaces, and let jC{V, W) be the set of all bounded 
linear operators from V t W. It is clearly possible to add linear operators, and to multiply them by scalars: 

(Li + L2){x) := Li{x) + L2{x) {\L){x) := A L{x) 

EXirther more, the sum of two bounded linear operators is bounded: If ||Li(a;)|| < CiUxH for i = 1,2, then 
||(Li +L2)(a;)|| < (Ci +C2)lla;ll by the triangle inequality. Similarly, XL is bounded if L is. It follows that 
C{V, W) is a vector space. The bounds C can bo used to define a norm on jC{V, W) — the operator norm: 

\\L\\ = inf{C : \\L{x)\\ < C\\x\\ for aU x} 

This plays an important role in functional analysis (but not in this course). 

2. As every finite dimensional real vector space is isomorphic to an R", the preceding corollary proves 
that all linear operators between finite dimensional normed vector spaces are continuous. This breaks 
down for infinite dimensional vector spaces. Consider, for example, the space = {/ : [0, 1] ^ K : 
/ u, and differentiable on (0, 1)}. This is a normed space, with norm defined by ||/|| := maX;i,g[o,i] 
For xq G (0, 1) map Dxg : V ^ M. : f i-^ f'{xo) is clearly linear, but it is not continuous: De- 
fine fnix) := ^sm2nnx. Then /„ — > (as ||/n|| < ^ — > 0). Nevertheless, with xo = |, we have 
fn{xo) = 27rcos7rn, so fn{xo) -h 0, i.e. D^^jn -h D^^Q. 

□ 



7.2 The Derivative 

7.2.1 Definition of the Derivative 

Recall that a function / : M ^ M is differentiable at xq G M if and only if there is a number a 
such that 

hm ^—^ ^ — - = a, equivalently lim ^ j ^ — - = a (*) 

x-^xo X — Xq h^O h 

The number a is called the derivative of f at xq, and denoted f'{xo). 

If we try to extend this definition blindly to functions / : — ^ we run into trouble: 
Xo, h, f{xo), etc. are vectors, and one cannot divide by vectors. A careful analysis the meaning 
of (*) is therefore necessary. 



Examples 7.2.1 l. Define 



X — Xo 



so that 

f{x) = f{xo) + a{x - Xo) +exo{x)ix - xo) 



The Derivative 



109 



(*) says that e^o (x) ^ as x ^ xq. Note that Exq {x){x — xo) "doubly fast" as a; ^ a;o: First, because 
Exoi^) — > as X — > xo, and secondly because in addition x — xo ^ 0. Thus for x close to xo, we have 

f{x) = f{xo) + a{x — Xo) + something very small 

In beginners' courses on calculus, the process of finding a derivative is usually introduced as the process of 
finding a tangent. Assuming / is "smooth" at xo, the tangent is the straight line y = f(xo) + a{x — xo) 
which best approximates the function / in a neighbourhood of xq. Thus exa{x){x — xq) is the amount by 
which the function / deviates from the tangent, and it is "doubly small" for x close to xq. 

This idea of linearization is at the heart of differential calculus. Note that every linear function L : 

R ^ R is given by multiplication by a constant, i.e. L{x) — ax for some a G R. Indeed, if we define 
a := 1/(1), then by linearity, we have L{x) = L{x -1) = x • L(l) = ax. We now see that we have 
/(x) = /(xo) + L(x — Xo) + exo{x){x — xo) which means that the change in / at xo is roughly linear: 

/(x) — /(xo) = L{x — Xo) + something very small 

for some linear operator L : R — > K. 

2. Let's see if we can extend this idea: Consider a smooth surface in R^, given by a function / : R'^ ^ R, 
written as z = /(x), where x = (x, j/) £ R^. The best "straight" approximation of / near a point xq € R^ 
is given by the tangent plane at that point: 

z = f{xo,yo) + a{x - xo) + b{y - yo) 

(i.e. the tangent plane is z = ax + by + c, where c — /(xq, j/o) — axo — bj/o. ) Thus 

/(x, y) = /(xo, yo) + a(x - xq) + 6(j/ - yo) + something very small 

for (x, y) near (xo,yo). Here, again, we have linearization: Every linear function L : R^ — > R is given by 
L{x,y) = ax + by for some constants a,b. Indeed, define a := 1/(1,0) and 6 := (0, 1). Linearity of L then 
implies that 

L{x, y) = L(x(l, 0) + y(0, 1)) = xZ/(l, 0) + yI/(0, 1) = ax + by 

We thus have 

/(x) = /(xo) + L(x — Xo) + something very small 
for some linear operator L : R^ — » R. 

□ 

We arc now almost ready to define the notion of derivative for a map / : M™. We 

want to define the derivative Df{xQ) of / at the point xo to be a linear operator with the 
property that 

/(x) = /(xq) + Z'/(xo)(x — Xq) + something very small 
The problem is defining the "something small". Example 7.2.1(1) points the way: 



Definition 7.2.2 Suppose that U C R" and that xq is an interior point of U. We say 
that f : U M"^ is differentiable at xq if there is a linear operator L : M" M™ such that 

f{x) = f{xo) + L{x - Xo) + £xo{x) -Wx- xqW (**) 

where e : U ^ M"* has the property that exo{x) — >■ as x — xq. 

The linear operator L is called the derivative (or Frechet derivative) of f ai xq, and denoted 

hyL = Df{xo). 

If / is differentiable at every point of U, we say that / is differentiable on U. 



110 



The Derivative 



Note that f{x),f{xo),L{ X — xq) , £x^^{x) G M"^, whereas x — xq €z M"". It is not possible to 
the multiply vectors Sxo{x) and x — xq- We can however, multiply the Sxoi^) with the scalar 
||x — xo||. We then maintain the idea that £xo{x) ■ \\x — xo\\ ^ "doubly fast" as x — xq: 
Firstly, because £xo{x) — as a; — xq, and secondly because then in addition ||x — xo|| — 
as well. 

There's a loose end we need to tie up immediately: In Definition 7.2.2, we define Df{xo) 
to be "the" linear operator satisfying (**). But what if there is more than one such operator? 
There isn't, but to prove it, we first need a lemma: 

Lemma 7.2.3 Suppose that L : — M"* is a linear operator, and define e : M" — 
by £{h) = If £{h) —^0 as h ^ 0, then L = 0, the constant map with value 0. 

Proof: Note that for a > we have 

L{h) = a~^L{ah) = a~^£{ah) \\ah\\ = £{ah) \\h\\ 
Now £{ah) — as a — 0, and so L{h) = 0. 

H 

Proposition 7.2.4 The derivative, if it exists is unique, i.e.: Suppose that L\,L2 are 
linear operators satisfying 

f{x) = f{xo) + Li{x - xq) + £i{x) ■ \\x - xq\\ i = l,2 

where £i{x) — > as x ^ xq. Then Li = L2. 



Proof: Subtracting, we see that Li{x — xo)—L2{x — xo)+£i{x)-\\x—xo\\ — £2{x)-\\x — xo\\ = 0, 
i.e. that 

L{h) = £{h) \\h\\ 

where L := Li — L2, h := x — Xq, and £{h) := £2(2^0 + h) — £i{xo + h). Note that ||e(/i)|| < 
||£:i(xo + h)\\ + ||e2(a:;o + ^)||) and that £i{xQ + /t) — > as /t — 0, so that also £{h) — as 
h By the preceding lemma, L = identically, i.e. Li = L2. 

H 



It is often convenient to recast Definition 7.2.2 in the following equivalent form: 



Proposition 7.2.5 / 


: [/ C M*^ 


is differentiable at xq if and only if there is a 


linear operator L : M" 


such that 






^ll/(x„ + /.) 


-f{xo)-L{h)\\ 






INI 


Then L = Df{xo). 







RemEirk 7.2.6 Note that we may have two different norms in the expression \\f(''o+h)-fixo)-L(h)\\ . -^j^g^ 
/ : R" R"*, the norm in the numerator is the R"*— norm, whereas the norm in the denominator is the 

R"-norm. 



□ 



The Derivative 



111 



Proof: By definition, / is differentiable at xq iff and only if there exist a linear operator L 
and a map e such that f{x) = /(xq) + L{x — xq) + e{x) \\x — xo\\, where e{x) — > as x — > xq. 
Put h := x — xq. Then we have 



and so 



Taking hm on both sides yields the result. 



Once more: 



/(xo + h) = f{xo) + L{h) + e{xQ + h) 
\\f{xo + h)-f{xo)-L{h)\\ 



£(xo + h)\\ 



A function / : is differentiable at a point xq G if and only 

if there exists a linear transformation L : M"" — > with the property 
that the function 



■.h^r^Af{xo + h)-f{x^)-L{h)\ 



is such that e{h) — as /i — 0. 
Then L = Df{xo). 



Example 7.2.7 Consider the function / : -» R : (a, y) i-> a;^ + j/ at the point xo = (i) . With h = (^) 

we have 

/(xo + h) - /(xo) = 4/1 + + /i^ 
Now 4/i + fc is hnear in ft., k and /i^ — > "doubly fast" as ft, A; —» 0. Define, therefore 

i.e. L is the hnear operator with 1x2 matrix representation (4 1) (w.r.t. the standard bases). As it is easy 
to see that e(h) — > as h — > 0, we conclude that D/(xo) = L. Thus / is differentiable at the point (2, 1), and 
D/(2,l) = (4 1). 

□ 



Proposition 7.2.8 If f -.W^ ^ M™ is differentiable at x £ M", then it is continuous at 

X. 

□ 



The object of the next exercise is to supply a proof: 

Exercise 7.2.9 (a) Show that if / : R" -» R"" is differentiable at xo, then there are constants C > and 
5 > such that 

||/(a;) — /(a;o)|| < C||x — xoll whenever — a;o||<<5 

(This is called the Lipschitz property of a function.) 
[Hint: Write 

f{x) = f{xo) + Df{xo){x - Xo) + 2x0 (a;) 1 1 a; - a;o|| 

and choose 5 > so that Exoix) < 1 when \\x — xq\\ < 5. Also use that linear operators R" — > R"" are 
bounded to find K so that \\Df(xo)iy)\\ < K\\y\\ for ah y G K". Put C := K + 1.] 



112 



The Derivative 



(b) Now use (a) to prove Proposition 7.2.8. 

□ 

Examples 7.2.10 1. Consider a map / : K ^ K which is differentiable at the point xo € K. Then the 
usual derivative /'(xo) is a real number, whereas the derivative as we have just defined it is a linear operator 
Df{xo) : K — > R. It is not hard to see that 

Df{xo)ih) = f'ixo)-h 

because lim l/(^o+^)-/(3:o)-/ (3:0)^1 _ q j^y dofinition of f'{xo)- As we pointed out in Example 7.2.1.1, every 

linear operator L : R — > R is of the form L{h) = a • /i for some a £ R, and so linear operators L : R — > R 
may be identified with real numbers a e R. 

2. Suppose that L : R" — > R"* is a linear operator. Then it is differentiable, and DL(x) = L for all x £ R". 
To see that this is so, we need only show that there is a function e satisfying £{h) — > as /i — > so that 

L{x + h)= L{x) + L{h) + s{h) \\h\\ 

But since L is linear, L{x + h) — L(x) + L(h) so e = (the constant mapping) docs the trick! 

Thus the derivative of a linear operator, at any point, is itself. This shouldn't be surprising if you think 

about it in the right way: The best linear approximation of a linear function must surely he itself. 

□ 

The contents of the previous example are worth stating explicitly: 

Proposition 7.2.11 Suppose that L : — is a linear operator. Then L is differen- 
tiable, and DL{x) = L for all x G M. 

7.2.2 The Chain Rule 

We need to generalize the chain rule, product rule, etc. from one to higher dimensions. We 
begin with the most useful one: the chain rule. 

Example 7.2.12 In one dimensional calculus, the chain rule is stated as follows: Suppose that y = 
y{x),u = u(y) are real-valued functions of one variable. If y is differentiable w.r.t. x and u is differentiable 
w.r.t. 2/, then u is differentiable w.r.t. x, and 

du _ du dy 
dx dy dx 

Hopefully, you have by now matured enough (mathematically) to find this confusingly imprecise. Let's try 
again: If y{x) is differentiable at Xo, and u{y) is differentiable at yo = y{xo), then u is differentiable w.r.t. x 
at Xo, and 

dul _ dul dy [ 
da; Lo dylyodxlxo 

That's certainly better, but it's still not entirely clear. What's missing is the notion of composition: In order 
to differentiate u w.r.t. x, we want to regard w as a function of x, i.e. u = u{y{x)). In other words, the u in 
^ is the the composition u{y{x)) = {uo y){x), whereas the w in ^ is just u{y) — they're not even the same 
function! 

When we look at it like this, we see that what (*) says is the following: 

{u o y)'{xo) = u'{y{xo)) ■ y {xq) 
which is completely precise. Moreover, it admits generalization, as we now show. 



□ 



The Derivative 



113 



Theorem 7.2.13 (Chain Rule) Suppose that U C is a neighbourhood of xq, and that 
V C R™ is a, neighbourhood of uq. Suppose further that f : U ^ V is differentiable at xq 
and that yo = f{xo), and that g: V ^ W is differentiable at yo- Then g o f : U ^ W is 
differentiable at xq and 

D{gof){xo) = Dg{f{xo))oDf{xo) 



Remark 7.2.14 Here Df{xo) is a linear operator R" M™, Dg{f{xo)) is a linear operator R"" -> W, 
and D{g o f){xo) is a linear operator R" — > R*". The preceding theorem says that the derivative 

is identical to the composition 
i.e. 

The derivative of the composition is the composition of the derivatives! 



Proof of the Chain Rule: Let L denote the hnear operator Df{xo) : — > M"*, and K 
denote the Unear operator Dg{f{xo)) : — > MP. Then K o L : — > MP is certainly a linear 
operator (check this if it isn't obvious to you!). To verify that KoL = D{g o /)(xo), we must 
show that 

{g o /)(xo + h) = {go /)(xo) + {K o L){h) + e{h) 
for some function e : — >■ 
functions £i , £2 so that 



satisfying lim £{h) = 0. We know, however, that there are 



f{xo + h) = f{xQ) + L{h) + ei{h) 
gifixo) + k)= gifixo)) + Kik) + e2ik) 



Thus 



(go f){xo + h)= gifixo + h)) 

= gifixo) + Lih) + eiih) 

= gifixo)) + KiLih) + Eiih) \\h\\) + S2iLih) + eiih) 
= gifixo)) + iKoL)ih) + eih) \\h\\ 

where 

L(/l) + £i(/l) 



L(/l) + £l(/l) 



eih) \\h\\ = Kieiih)) \\h\\ + £2(L(/j) + £i(/i) 
To show that £(/i) — as ^ — 0, it suffices to show that 

Kieiih)) -^0 ash^O 

and 



£2(L(/i)+£i(/i) \\h\\) 


Lih) + siih) \\h\\ 


\\h\ 


1 



as h ^ 



114 



The Derivative 



The first of these is easy: By Lemma 7.1.2 there is a constant a so that ||ii'(x)|| < c||x|| for 
all X G M™. In particular, 

< a||ei(^)|| ^ as ^ ^ 0, because ei{h) ^ 
To prove the second, let be a constant so that ||L(x)|| < /3 ||x|| for all x G M". Then 
\\L{h) + e,{h)\\h\\\\ ^ \\L{h)\\ 



+ \\eiih)\\<p+\\8iih)\\ 



Thus 



e2iL{h) + ei{h) 



L{h) + ei{h) 



< 



£2{L{h) + ei{h) 



which is easily seen to converge to as /t — 0. 



We now have a beautiful definition of the derivative: The derivative of a function / : 
M" M™ at a point gM™ embodies the notion of linearization — it is, in essence, the linear 
transformation M" — which best approximates how / changes near x. 

Unfortunately, this doesn't tell us how to calculate it. . . 

Recall, from linear algebra, that any linear transformation L : M" is essentially the 

same as a matrix. More precisely L has a m x n-matrix representation with respect to the 
standard bases, so that 



Lji = j*^ component of -L(ej) 



l,...,ra; j = l,...,m 



where is the z*'* standard basis vector. It follows that I?/(x) is essentially an m x n-matrix 
— the Jacobian matrix of / at x: 



Df{yi)ji = component of D/(x)(ei) i = l,...,n; j = l,...,m 
The question is: How do we calculate these entries? 

First, let's reduce the dimension of the problem, by looking at components. 



7.2.3 Components 

In this section, we look at functions / : R™', in terms of their components. Every 

product Ai X ■ ■ ■ X Am of sets has associated with it the projection mappings tti, . . . , TTm- T^j 
picks out the j*^ component, i.e. 

TTj : ^1 X • • • X Ajn Aj : (ai, . . . , am) i— ^ cij for j = 1, . . . ,m 

Consider now a special case: If we equip with the standard basis, then every x G M has 
a representation x = (xi, . . . ,XmY^ ■ The projection mappings from M'" = R x • - x R to R, 

m times 

are therefore given by 

Tri : M™ ^ M : X x-^' for j = 1, . . . , m 



The Derivative 



115 



Thus 



X = 



These tt-' are clearly linear operators M™' 
The following lemma is obvious: 



\7r'"x/ 
. (CHECK!!!) 



Lemma 7.2.15 A sequence in M™ converges if and only if each of its component sequences 
converges, i.e. 

x„ — X in if and only if for all j = 1, . . . ,m we have 7r^x„ — tt-'x in M (i.e. ). 



Proof: If x„ — > X, then tt-^x^ — > tt-'x, because the tt-' are linear, and therefore continuous (cf. 
Corollary 7.1.3). 

Conversely, suppose that each oci, ^ x, and let e > 0. By definition of "xn — >■ x-'" , it is 
possible to choose, for each j = 1, . . . ,m, an. Nj e N such that 



n > Nj 

Let N := max{iVi, . . . , Nm}. Then 

n> N =^ 



\xl^ — x^\ < 



m 



x-'^ — x-'\ < 



m 



for all J = 1 , . . . , m simultaneously 



(because n> N implies also n > Nj). Then 



whenever n > N. 



Suppose now that / : ^ M"*. For j = 1, . . . , m, define f^ : ^ 1 
Example 7.2.16 If f -.R^ -. {x,yy'' ^ {x + y,x'^ + y^,-x - y-y, then 

f{x,y) = x + y, f{x,y) = x^+y\ f'{x,y) = -x - y'^ 



by := TT-' o /. 



□ 



Thus every function / : 



can be thought of as a m-dimensional "vector of functions: 
/ 



Proposition 7.2.17 / : ^ is continuous aixo if and only if each of the component 
functions f^ : M" ^ M are continuous. 



116 



The Derivative 



Proof: (^): If / is continuous at xq, so is the composition = o f: is linear, 
linear operators betwen euclidean spaces are continuous (Corollary 7.1.3), and compositions 
of continuous functions are continuous. 

{<=)■ Suppose that each is continuous at xq, for j = 1, . . . , m. To prove / is continuous at xq 
it suffices to show that if x„ — xq, then /(x„) — /(xq). Now we know that /■'(x„) — /^(xq) 
for all J = 1, . . . , TO. 

H 

7.2.4 Pcirtial Derivatives and the Jacobian Matrix 

In subsection 7.2.3 we saw that any function / : — >■ can be written as a vector of 
real-valued functions/ = {f^, . . . , f"^Y^^ where : R" ^ M is defined by = tt^ o f for 
J = 1, . . . , TO, and TT^ : — > M is the j^^ projection map. 



Proposition 7.2.18 / : M" — M"* is differentiahle at x, if and only if each f^ = tt-' o / : 
M" — M is differentiahle at x (for j = 1,. . . ,m). Then 

Df^{x) = D{Tr^ o /)(x) = TT^ o D/(x) 

i. e. for all h G we have 

/L>/i(x)(h)' 



i^/(x)(h) = 



\Df"^{^){h)^ 



i.e. the matrix representation of -D/(x) has as j*^ row the m-dimensional row vector 
which corresponds to the linear operator D/-'(x) : M" — ^ M. 

Proof: First suppose that / is differentiable at x. We can see that each = ir^ o f is 
differentiahle at x, as 

L»/^ (X) = W(/(X)) O L»/(X) = TT^' O D/(X) 

where we used the chain rule and Proposition 7.2.11, applied to the (linear) projections -kK 
Conversely, suppose that each f^ is differentiable at x, for j = 1, . . . , to, i.e. that 

/^■(x + h) = /^(x) + L>/J(x)(h) + £^(h)||h|| where £^(h) ^ as h ^ 

Define L : M'* ^ M"^ by 

/L>/i(x)(h)\ 
L(h) := : 

Vi^r(x)(h)y 

It is clear that L is a linear operator. To see that L = -D/(x) we must show that 

/(x + h) = /(x) + L(h)+£(h)||h|| 



The Derivative 



117 



where £(h) — as h — 0. Now 

/ /i(x + h) \ / /i(x) + Df\yi){h) + e\h)\\h\\ \ 



/(x + h) 



V/"*(x) + h)/ V/"W + Dr{-x.){h) + £"*(h)||h| 



/(x) + L(h) + £(h)||h|| 



where we define £(h) := (£^(h), . . . , £"*(h))*''. Clearly, £(h) ^ in M'" iff each e^{h) ^ in 
M. It follows immediately that / is differentiable at x and that -D/(x) = L, as asserted. 

H 

Armed with proposition 7.2.18, we now have reduced the problem of finding the entries 
of the matrix -D/(x) to that of finding the entries of the vectors D/^ (x): 

Df{-^)ji = j^^ component of Z)/(x)(ei) = Df{yi){ei) i = l,...,n; j = l,...,m 

To calculate the entries of the matrix Df(x.), we therefore need to investigate L>/-'(x)(ej). 

Now each is a map from to M. Consider, therefore, a function g : M" — ^ M which is 
differentiable at x. By definition, 

3(x + tei) = c/(x) + Dg{-x){tei) + £(tei)||tei|| 

where e(h) ^ as h ^ 0. By linearity and some trivial calculations, this reduces to 

5(x + tei) = 5(x) + t Dg{x) {a) + \t\ e{tei) 

Rearranging, we obtain 

where e{tei) — as t — (because then tei — 0). Taking limits, we see that 



i.e. that 



g{x^,...x' + t,...x'^) - g{x^,...Xn) 
Dg{x){ei) = lim 



The limit on the right ought to be very familiar: by definition 

g{x^, ...x' + t,...x")- g{x^, . . . a;„) dg 



lim ■ 
t-*o 



dx^ 



I.e. 



Thus: 



Dg{^){ei) 



dg 
dx^ 



Proposition 7.2.19 /// : 

( i.e. Df{x.) is just the gradient grad f = Vf, but without the commas). 



(Of 






df 


\dx^ 


X dx"^ 


X 


dx"^ 



is differentiable at x, then 

= (5i/(x) 52/(x) ... a/(x)) 



118 



The Derivative 



It follows immediately that 



and thus that: 



Theorem 7.2.20 Suppose that f : U ^ W^, where U CM."' is a neigbourhood of-x.. Then 
each exists at x (for i = 1, . . . ,n and j = I, . . . ,m) and the linear operator £)/(x) has 
an m X n-matrix representation (w.r.t. the standard bases) 



dx\ dx"? ' ' ' dx" 
dp dp df_ 

dx^ dx^ ■ ■ • dx^ 



gjm Qj-rn Qfrr 

\ dx^ dx'^ ■■■ aa;"/ 



where the partial derivatives are evaluated at the point x. 



7.2.5 A Sufficient Condition for the Existence of Df{x) 

Later, we will encounter an exercise that shows that the converse of Theorem 7.2.20 is not 
true: There arc functions / which are not diffcrcntiablc at a point x, but for which all the 
partial derivatives of / exist at x. If we impose some mild continuity conditions on the partial 
derivatives, however then a converse of Theorem 7.2.20 is true. To prove it, we will need the 
Mean Value Theorem from elementary calculus. In fact, the Mean Value Theorem is one of 
the most important results in analysis, and we shall make heavy use of it later, when we prove 
Taylor's Theorem. It states that: 



Theorem 7.2.21 // / : [a,b] — M is differentiable on {a,b), then there is c E (a, 6) so 
that 

m-f{a) = f{c){b-a) 



We revise the proof in the next exercise: 

Exercise 7.2.22 (a) First prove Rolle's Theorem: If ip : [o, 6] — > R is differentiable on (a, 6) and f{a) = 
= then there is c £ (a, 6) so that iy5'(c) = 0. 

[Hint: ip must have a maximum or a minimum on \a,h]. At least one of these extrema must occur at an 

interior point c £ (a,b). ] 

(b) Now prove the Mean Value Theorem by applying Rolle's Theorem to 

^{x) := fix) - f{a) - _ a) 



a 



Definition 7.2.23 Let U C W he a neighbourhood of x. A function / : U M"^ is 
said to be continuously differentiable at x if and only if there is a open neighbourhood 
V CU of X such that: 

(i) all the partial derivatives dif^ exist in V (for i = 1, . . . , n and j = 1, . . . ,m); 

(ii) all the partial derivatives dif^ are continuous at x. 



The Derivative 



119 



Theorem 7.2.24 Suppose that ?7 C R"- is a neighbourhood o/ x. If f : U ^ is 

continuously differentiable at x, then f is differentiable at x. 



Proof: By Proposition 7.2.18, it suffices to prove the result for functions / : M" — >■ M. 

Consider now the telescoping expansion 

/(x + h)-/(x) 
= f{x^ + h\x'^, x^,... x") - f{x\x^, x^ . . , z") 
+ f{x^ + h^x"^ + h^, x^,..., x"") - f{x^ + h\x'^,x^, . . . , x") 
+ 

+ f{x^ + h\x'^ + h\... x'^-^ + x'^ + h"") - f{x^ + h\x^ + /l^ . . . , x"-^ + /(x") 

Let 51 : M ^ M : y ^ /(y ). Then = dif. By the Mean Value Theorem, there 

is ci between x^ and x-*- + so that g[{bi)h^ = gi{x^ + h^) — gi{x^), i.e. so that 

/(x^ + /^i, x2, . . . , x") - /(x\ x2, . . . , x") = 9i/(ci) for ci := (6i, x^, x^, . . . , x'^) 

Define 52 ^ K — > M : y /(^^^ + /i^ , y, x'^, . . . x") . Then g'2 = 82 f ■ By the Mean value Theorem 
again, there is 62 between x^ and x^ + /i^ so that g'2{h2)h^ = g2{x^ + h^) — 92{x^), i-e. so that 

/(x^+/l^x2+/l^x^...,x")-/(xl+/^^x^x■^...,x") = 52/(c2)/t^ for C2 := (x^+/^^62,:E^. 

Continuing in this way we see that there is Cj = (x^ + h^, . . . x*~^ + , 5^ , x*"'"^ , . . . x") , where 
6i lies between x' and x^+W, so that the z*^ row of the telescoping expansion of /(x+h) — /(x) 
is dif{ci)h^. Define the linear operator L : M" — by 

n 

L(h) :=^a,/(x)/i^ 



i=l 



Then 



where 



But 



/(x + h) - /(x) = dif{ci)h' = L(h) + £(h)||h| 

i=l 
1 " 



" " i=l i=l 

Since Cj ^ x as h ^ 0, and since each dif is continuous at x, we have dif(ci) — >■ dif{x.) as 
h ^ 0. It follows that e(h) ^ as h ^ 0, and thus that L = Df{x). 

H 

Exercise 7.2.25 Consider the function / : ^ E defined by 

if (x, J/) ^(0,0) 



f{x,y) := { x^+y^ 

if (x, J/) = (0,0) 

(a) Show that / is differentiable (i.e. that Df{x,y) exists) at all points {x,y) ^ (0,0). 

(b) Show that / is not differentiable at (0,0). 

□ 



120 



The Derivative 



7.2.6 The Chain Rule: Reprise 

Let's have another look at the chain rule: Recall (or verify, if you don't recall) that if T : 
M" — > M"* and S : M"* — > MP are linear operators, then so is their composition 

SoT -.W ^MP -.y:^ 5(r(x)) 

The matrix representation of 5 o T is the product of the matrix representations of S and T. 
To see this, denote for the moment the matrix representation of T (w.r.t. the standard bases) 
by [T], so that T'(x) is the matrix x vector [Tjx, etc. Then 

(Sor)(x) = [SoTjx 

but also 

(5or)(x) = s(r(x)) = [s][T]x 

so that [5 o T] = [5'][T], as asserted.) 

We apply this result to a composition: Let / : M" — > and g : —>■ W be differentiable. 
Then 

L>(5o/)(x) = L>5(/x)oL>/(x) 
[Digof)iy)] = [Dgifiy)][Dfiy:)] 

Thus: 

The derivative of a composition is the product of the derivatives! 

It follows that for z = 1 , . . . , n and k = 1, . . . ,p we have 

5^(5 0/)'(x) = [^(5 0/)(x)]fe^ 
m 

= J2[Dg{f{y))],j[Df{y)y 

m 

= ^a,/(/(x))a,/^(x) 

In particular: 

Proposition 7.2.26 If f = f^, . . . , : — are continuously differentiable at x 
and g : -^M. is differentiable at (/^(x), . . . , /"^(x))*'', then g o f is differentiable at x 
and 

m 

di{g o /)(x) = d,g{f\y), r (x))5,/^(x) 

i=i 

In a course on advanced calculus, this is usually (and sometimes usefully) phrased as follows: 
Suppose that = /-'(x) are functions M" — > M™" which continuously differentiable at x 
for j = l,...,m and that g{y^, . . . ,y'^) is a function — M which is differentiable at 
y = (/^(x), . . . , /"^(x))*^. Then g{y^{-x), . . . , y"*(x)) is differentiable at x, and 



The Derivative 



121 



7.2.7 Further Manipulation Rules 

Here's a nice proof that the derivative of a sum equals the sum of the derivatives: 

Proposition 7.2.27 If f.g : M" are differentiable at x, then D{f + g){x) = 

DJ{x) + Dg{x). 



Proof: Note that f + g -.W \s the composition of the function /i : ^ M"* x 

with the function s : M"* ^ M"* x M"*, where 



i.e. {f+g){x) = {soh){x). Now s is linear, so that Ds{h{x)) = s. Furthermore, by Proposition 
7.2.18, we see that Dh{x) = i^^^^J^)- By the chain rule 



Exercise 7.2.28 Prove Proposition 7.2.27 directly from the definition of the derivative. 

□ 

[Henceforth, we will not always distinguish between row vectors and column vectors. Thus 
Df{x, y) will mean the same thing as 

Next, we tackle the multidimensional analogue of Leibniz' Rule (or the Product Rule) for 
differentiation. First, we need a lemma: 



Lemma 7.2.29 If p : R"^ x R"" ^ R is defined by 

p{x, y) = {x, y) 

(where (•, •) is the standard inner product on R"^ .) Then p is differentiable, and 

Dp{x, y){h, k) = {x, k) + (y, h) 
Proof: Fix {x, y) G M"* x M"*. The map L : M"* x M"* ^ M defined by 




and 




D{f + g){x) = Ds{h{x)) o Dh{x) = s( ')= Df{x) + Dg{x) 




L{h,k) := {x,k) + {y,h) 



is easily seen to be linear: 



L(ai(/ii, ki) + a2(/i2, ^2)) = L{aihi + 02/12, aiki + 0:2^2) 



= {x, aiki + q;2A;2) + {y, aihi + 02^2) 

= ai{x,ki) +a2{x,k2) +ai{y,hi) +02(^,^2) 

= aiL{hi, ki) + a2L{h2,k2) 



Now 



p{x + h,y + k) = {x + h,y + k) 

= {x,y) + {x,k) + {y,h) + {h,k) 
= p{x,y) + L{h,k)+s{h,k)\\{h,k)\\ 



122 



The Derivative 



where 



11(^,^)11 

Now \\{h,k)\\^ = E^i[(^')^ + ik'f] = II^IP + II^IP > II^IP- By the Cauchy-Schwarz 
inequahty |(^, < \\h\\ \\k\\, and so 

\£{h,k)\ < 

and hence £{h, k) ^ as {h, k) 0. Hence L = Dp{x, y), as required. 

H 



Proposition 7.2.30 If f,g -.W ^ are differentiable at x, then {f,g) : W ^ R is 
differentiable at x, and 

D{f,9){x) = {f{x),Dg{x)) + {g{x),Df{x)) 



I.e. 



D{f,g){x){h) = {Df{x){h),g{x)) + {f{x),Dg{x){h)) 



Proof: If g : ^ M"^ X and p : M"^ X ^ M are defined by 

q{x) ■= {f{x),g{x)) p{x,y) = {x,y) 

then {f,g) =poq. Now = {Df{x),Dg{x)) by Proposition 7.2.18, and Dp{x,y){h,k) = 

{x, k) + {y, h) by the preceding lemma. Thus, by the chain rule, we have 

D{f,g){x){h)=D{poq){x) 

= Dp{q{x)) o Dq{x){h) 

= Dp{f{x),g{x)){Df{x){h),Dg{x){h) 
= {f{x),Dg{x){h)) + {g{x),Df{x){h)) 



The inner product in M is just multiplication: {x, y) = xy. Hence: 



Corollary 7.2.31 If f,g -.W^ are differentiable at x, then 

Dif ■ g)ix) = f{x)Dg{x) + g{x)Df{x) 



7.2.8 Directional Derivatives 



Definition 7.2.32 Let [/ C be a neighbourhood of xq, and let / : f/ ^ M. If v G M, 
then the limit (if it exists) 

ft \ r /(^O + iv) - /(xo) 
Z^v/(xo) := lim ^ 

is called the directional derivative (or Gateaux derivative) of / at xq in the direction v. 
(Here, the limit is taken over i G M.) 



The Derivative 



123 



Thus: If V is a unit vector, then Dv/(xo) is the instantaneous rate of change of the 
function / in the direction v at the point xq. 

RertiEirks 7.2.33 1. Note that 

73v/(xo) = ^/(x„+tv)| 

at lt=() 

(If you don't see this, just define g{t) := /(xo + tv) and write down the definition of the derivative g'{0).) 

2. Also note that if e, is the i"^ standard basis vector, then 

i3e,/(xo)=ft/(x„) 

i.e. that the t"^ partial derivative is just the directional derivative in the direction of the i"^ basis vector. 

3. If is the zero vector, then _Dv/(xo) is uninteresting, always being equal to zero. So when we talk about 
directional derivatives, we implicitly assume that the direction vector v is not the zero vector. 

□ 

The next proposition shows how directional derivatives may be obtained from the (Prechet) 
derivative Df{xo): 



Proposition 7.2.34 Suppose that U C R"^ is a neighbourhood o/xq, and that f : U ^ M.. 
If f is differentiate atxQ, then all the directional derivatives Z)v/(xo) exist (for allv G 
with V 0), and 

L>v/(xo) = I?/(xo)(v) 



Proof: By definition of the Frechet derivative, we have 

/(xo + tv) = /(xo) + tL>/(xo)(v) + £(iv) ||iv| 
where £(h) — as h — 0. Rearranging, we see that 
/(xo + tv) - /(xo) 



^/(xo)(v) 



t 

Hence hmt^o ^(^°+*^)-^(^°) = L>/(xo)(v), as asserted. 



e{tv)\ ||v|| ^0 as t ^ 



RertlEirks 7.2.35 Consider a function / : -» R. The graph of / is a surface z = /(x) in R^ (where 
X = (a;, J/)), and the point xo, /(xo)) lies on the graph. If v is a non-zero vector in R^, the the instantaneous 
rate of change in the direction of v is -Dv(xo), i.e. 

/(xo + W) ^ /(xo) + t/)v/(xo) 

for small t. So 

= /(xo)+tDv/(xo) teR 

gives the tangent to the surface at the point xq in the direction v. The collection of all these tangent lines 
forms the tangent plane, and the tangent plane may therefore be described by the equation 

z = /(xo) + L'/(xo)(x - xo) 

i.e. 

z = fixo, yo) + dif{xo, yo) ■ {x - xo) + d2f{xo, yo) ■ (y - yo) 

□ 



124 



Taylor Theorems 



Note, however that the converse of Proposition 7.2.34 is not true: It is possible for all 
directional derivatives of / to exist at a point xq, and yet for / not to be differentiable at xq. 
In particular, it is possible for all partial derivatives of / to exist at xq, and yet for / to be 
non-differentiable at xq. This is the content of the next example: 



Example 7.2.36 Consider the function / : ^ R defined by 



f{x,y):= {x' + y 



if a; / -2/ 



y Q if a;^ = —y 

If V = {vi,V2), then 

/(0 + tv)-/(0) _ 1 t^VlV2 _ V1V2 



t t f^Vl + tV2 tVl + V2 

Taking the hmit as t — > we see that 



vi it V2 ^ 
if U2 = 



Hence all directional derivatives exist at 0. 

However, / is not continuous at 0, and therefore not differentiable there: For example x„ = has 
x„ -> 0. Yet 

j(x„) = — - — j— = — - — 3 — > OO as n — > OO 

□ 



Exercise 7.2.37 Let A C R^ be given by 

A:= {{x,y) : X > and < y < a;^} 

Define / : R^ -» R by 

■'^ ^' \0 else 

(a) Show that -Dv/(0, 0) = for all v € R^ 

(b) Show that / is not differentiable at (0,0). 

[Hints: (a) First show that any straight line through (0, 0) contains a line segment about (0, 0) which lies 
wholly in the complement of A. Then show that for any v 7^ there is (5 > so that f{tv) = whenever 
\t\<5. 

Show that / is not continuous at (0,0).] 



□ 



7.3 Taylor Theorems 

7.3.1 Taylor's Theorem for One Vciriable 

We introduce a number of Taylor Theorems in one dimension. As these are probably already 
familiar, we omit discussion of these results. 

Before we continue, recall the Mean Value Theorem, which is Theorem 7.2.21. 



Taylor Theorems 



125 



Theorem 7.3.1 (Taylor's Theorem) 

Let n G N. Suppose that f : [a,b] ^ R is a function with the property that 

(i) f, /', /", . . . , /("^~^) are defined and continuous on [a, b]; 

(a) /^"^ exists on {a,b). 

Suppose further that a, (3 are real numbers such that a < a < (3 < b. Then there exists a 
7 G (a, /3) such that 

fe=0 



Proof: For x G [a, b\ , define 

fe=0 

We must show that there is a 7 G {a,P) such that = P(/3) + - a)". Now let 

f{f3) = P{(3) + M{l3-ar 

and define 

g{x) = f{x) - P{x) - M{x - a)" (x G [a, b\) 
Note that g{(5) = 0, by definition of M. Note also that 

gi^){a) = f^^\a) - P^^\a) = A; = 0,l,...,n-1 

because P^^\a) = f^^\a) for such k. 

Wc must show that M = ^Jf^^ for some 7 between a and /9. Note that P{x) is an 
(n — l)*'^ degree polynomial in x, so that P(")(a;) = 0. It follows that 

^^(x) = (x)-n!M 

Thus if we can find a 7 such that 5^"-' (7) = 0, we will have M = as required. 

Now both g{a) = Q and g{f3) = Q. By the Mean Value Theorem, there is 71 G (a, /?) such 
that g'(yi) = 0. Thus both g'{a) = and 5' (71) = 0. By the Mean Value Theorem, there is 
72 G (a, 71) such that g"(72) = 0. Thus both g"{a) = and g"{'y2) = 0. By the Mean Value 
Theorem, there is 73. . . 

After n — 1 steps, we obtain, from the fact that both g^'^~^\a) = and ^("■~^)(7„_i) = 0, 
a 7n G {a,jn-i) such that g^'^^'jn) = 0. 

7„ is therefore the 7 that we seek. 

H 

We leave the proofs of the following results as a very important exercise: 



126 



Taylor Theorems 



Theorem 7.3.2 (Taylor's Theorem) // / : {x - a,x + a) ^ M 
with \f^''\t)\ < M for all t e {x - a,x + a), then 


is n times differentiable, 




n—l 
j=0 


^ M|t-x|" 
- n! 


□ 



Theorem 7.3.3 (Taylor's Theorem) If f : {x — a, x + a) ^ M. is n times differentiable, 
and f^'^'^^\x) exists, then 



n+1 

f{t) = ^^{t - xy + e{t)\t - xr+^ where e{t) ^ as t 

3=0 



□ 



Exercise 7.3.4 We prove the preceding theorems. (This exercise is from A Companion to Analysis, by 
T.W Korner, publ. Amcr. Math. Soc. (2004)) 

Without loss of gencraUty, we may assume that x = 0. Consider two functions /, g : (—a, a) —> K, where a > 0. 

(a) Suppose that f,g are differentiable, that /(O) = g{0), and that f'{t) < g'{t) for all < t < a. Show that 
fit) < g(i) for all < t < a. 

(b) Suppose that /(O) = 0, and that \ f{t)\ < \t\'' for all t € (-o,a). Show that \f{t)\ < \t\''+^/{r + 1) for all 
\t\ < a. 

(c) Suppose that <? is n times differentiable, with |g''"'(t)| < M for all t £ (—0,0), and that g{0) = g'{0) = 
g"{0) = ... = gr("--i)(0) = 0. Use (b) n times to show that 

\9{t)\ < for all \t\ < a 



n 



(d) Suppose that f is n times differentiable, with \f^"\t)\ < M for all t € (—a, a). Show that 

< for all \t\ < a 



n-l 

fit)-^^f 



3=0 



This proves Theorem 7.3.2. 

[Hint: Set g^t) := f{t) - j:]!^ ^^t^ and apply (c).; 

(e) Suppose that p is n times differentiable in (—a, a), and n + 1 times differentiable at 0. Further assume 
that g(<S) = g'(0) = g"{0) = ■■■= g'^"+^\0) = 0. Show, using (c), that 

\g{t)\ < r]{t)\t\^-^ where 7?(t) ^ as t ^ 

[Hint: First use the definition of the derivative to explain why (u) = e{u)u for some function e such that 
e{u) ^ as M ^ 0. Then set j]{t) := max-t<v<t and set Mt := r]{t)\t\. Show that |s''"''(m)| < Mt 

for all u G {-t,t).] 

(f) If / is n times differentiable in (—0, a), and n + 1 times differentiable at 0, show that 

n + 1 j 

/(t) _ J2 ^^t' < "'^'T^' where r?(f) ^ as t ^ 
j=o I 

This proves Theorem 7.3.3 



□ 



Taylor Theorems 



127 



7.3.2 A Higher-Order Taylor Theorem 

For the one-dimensional Taylor Theorems, we employed the higher derivatives f"{x), f^^\x), 
. . . , f^^\x) of the function / : M ^ M. If / : M, it is not yet clear what kind of objects 

the higher derivatives f{-K), f{-K), . . . ,D^f{ii) actually are. We know that Df{^) is a 
linear map. It will turn out later that D^/(x) is a bilinear map, and that, in general, the 
f{yi) are multilinear mappings from M" x . . .M" to M. We will attempt to do without the 

V 

k times 

use of these objects. 

Remarks 7.3.5 In what follows, we will use the Mean Value Theorem over and over, in the following 
manner: Suppose that / : ^ M is differentiable at the point x — {x,y). We will be interested in how much 
the function / can change in a neighbourhood of x, i.e., with h = {h, k), we will be interested in finding bounds 
for the expression 

S := |/(x + h) - /(x)| = \f{x + h,y + k)- fix, y)\ 

Now clearly, 

S=\f{x + h,y + k)- fix, y + k) + fix, y + k) - fix, y) \ 
<\fix + h,y + k)-fix,y + k)\ + \fix,y + k)-fix,y)\ 
and so we will find ourselves looking at expressions of the form fix, y + k) — fix, y) or fix + h,y) — fix, y). 
If we fix X, then giy) := fix, y) is a function of y, and 

a'iy) = fAv) 

Now fix, y + k) — fix, y) = giy + k) — giy), so by the Mean value Theorem there is k* between and k so that 

giy + k)- giy) = g'iy + k*)k 

i.e. 

fix, y + k) - fix, y) = /,2(x-, y + k*)k 
Similarly, if we keep y + k fixed, and apply the Mean Value Theorem to gix) := fix, y + k), we see that 

fix + h,y + k)- fix,y + k) = /,i(a; + h*,y + k)h 

Putting this together, we see that 

S<\fAx + h*,y + k)\ \h\ + \f2ix,y + k*)\ \k\ 

Thus knowledge of the size of partial derivatives in a neighbourhood of (a;, y) will give us information about 
the size of S, the change in /. 

□ 

We'll begin by looking at the second-order Taylor expansion of a function of two variables 
/ : ^ M. One important fact that we will will state upfront is the following, though you 
no doubt already know it: 

d'f _ d^f 
dx^dx^ dx^dx^ 

when these derivatives are continuous. To simplify notation, we define 

df df 

etc. 



Theorem 7.3.6 Suppose that f : U C. M*^ — > M is twice differentiable on the open set U , 
and that the functions ^ A , are continuous. Then 

dx^dx^ 



128 



Taylor Theorems 



Proof: ^ For simplicity, by holding all other variables fixed, we may assume that f : U C. 
M. Let X = (x, y) G [/ C and let h = {h, k). Consider 



If we define 
we see that 



Sh,k ■■= [fix + h,y + k)-f{x + h, y)] - [fix, y + k) - fix, y)] 
3kix) := fix, y + k)- fix, y) 



Sh,k = 9kix + h) - gk{x) 
By the Mean Value Theorem there is Ck,h £ {x,x + h) so that 

Sh,k = a'kick,h) ■ h 

and hence 



Sh,k 



df 



dx 



(ck,h,y+k) dx 



(ck,h,y)^ 



■ h 



By the Mean Value Theorem, again, we see that 

d'f 



Sh,k = 



dydx 



{ck,h,dk,h) 



hk 



for some dk,h ^iy,y + k). 
Now note that also 



Sh,k = [fix + h,y + k)- fix, y + k)] - [fix + h,y)- fix, y)] = gniy + k) - guiy) 

where ghiv) '■= fix + h,y) — fix,y). In exactly the same way as above, we can find dh^k € 
iy,y + k) and the c^^k £ ix,x + h) so that 



Sh,k 



d'f 

dxdy 



■kh 



As this is true for all h, k it remains true if we let h,k ^ 0, by continuity of the partial 
derivatives. Then Cj^^/j, Cfi^k ~^ x and dk^h^ dh,k ~^ U and so 



d'f 

dxdy 



_ d^f 
{x,y) dydx 



as claimed. 



We can now write down the promised second-order Taylor Theorem. 



Theorem 7.3.7 Suppose that U (^Mr is a neighbourhood ofxQ. If the partial derivatives 
exist in U and are continuous at :x.= ix,y), and ifh. = (/i, k), then 

/(xo+h) = /(xo)+/,i(xo)/i+/,2(xo)A;+i(/,n(xo)/i2+2/,i2(xo)/iA;+/,22(xo)fc2)+£(h)||h||2 

where 



£(h) ^ 



as 



h^O 



^Taken from Elementary Classical Analysis, by J.E. Marsden. 



Taylor Theorems 



129 



To prove this second-order Taylor Theorem, we need a pecuhar lemma: 



Lemma 7.3.8 Suppose that U C M? is a neighbourhood of 0, and that f : U ^ M. is a 
function whose second partial derivatives exist in U , and are continuous at 0. Suppose in 
addition that 

= /(0,0) = /,i(0,0) = /,2(0,0) = /,ii(0,0) = /,i2(0,0) = /,22(0,0) 

Then 

^0 as h ^ 

n r 



Remarks 7.3.9 Suppose that / is as in the Lemma. The second order Taylor expansion of / about (0,0) 
is then given by 

/(7i, k) = /(0, 0) + (/,i(0, 0)h + /,2(0, 0)A;) + | (/,ii(0, Q)h^ + 2/,i2(0, 0)hk + /,22(0, 0)fc') + £{h)\\h\f 

/(h) = e(h)||h||^ 

as the function and all the derivatives are zero at (0,0). Thus e(h) = yj^p- To prove that — > is thus 
to prove that £(h) — > (as h — > 0). 

In other words, to prove the Lemma is to prove the 2nd order Taylor Theorem about (0, 0) for functions 

/ with the properties given in the statement of the Lemma. It turns out that the 2nd order Taylor Theorem 
for an arbitrary function about an arbitrary point can easily be reduced to this case. So this Lemma does all 
the hard work. 



□ 



Proof of Lemma 7.3.8: Write h = {h, k). Fix e > 0. Since the 2nd order partial derivatives 
are continuous at (0, 0), there is 5 > so that 

|/,ii(h)|, |/,i2(h)|, |/,22(h)| < e whenever ||h|| < 5 

Applying the Mean Value Theorem to the function g{k) = f,i{h, k), we see that g{k) — g{0) = 
g'{k*)k for some k* G (0,A;), and thus that f^i{h,k) — /,i(/i,0) = f,i2{h,k*)k and thus that 

\f,i{h,k) - f,i{h,0)\ <e\k\ < e\\h\\ when ||h|| <S 

A similar argument shows that 

|/,i(^0)-/,i(0,0)| <£|/i| <£||h|| when ||h|| <5 

Now 

\f,i{h,k)\ = \li{h,k)-li{0,0)\ 

< \f,i{h,k) - f,i{h,0)\ + \f,i{h,0) - /,i(0,0)| (*) 
<2£||h|| when||h||<5 

Using the Mean Value Theorem again, we see that /,2(0, k) — /,2(0, 0) = /,22(0, k)k for some 
k e {0,k), and thus that 

|/,2(0,fc)-/,2(0,0)| <e|fe| <e||h|| when||h||<<5 (**) 



130 



Taylor Theorems 



Similarly, f{h,k)-f{0,k) = f^i{h,k)h for some h G (0,/i). But ||(^, A;)|| < A;)||, and thus 
\f^i(h,k)\ < 2e||h|| when ||h|| < 5, by (*) . It follows that 

- /(0,A;)| < 2£||h|| • < 2e||h||2 when||h||<(5 (f) 
Similarly, from (**), we see that 

|/(0,A;)-/(0,0)| <£||h||2 when||h||<5 (J) 
Putting (t), iX) together, we see that 

\f{h, k)\ = \f{h, k) - f{0, k) + f{0, k) - f{0, 0)1 
<\f{h,k) - f{0,k)\ + \f{0,k) - f{0,0)\ 
< 3£||h|p when ||h|| < S 

and thus that 

^?^^^<3£ when||h||<^ 
||h|p 

Since e was arbitrary, it follows that — >■ as h — 0. 

H 

Proof of Theorem 7.3.7: We may assume that xq = 0, by translation^. Define g : U ^M. 

by 

gix, y) := f{x, y)- [/(0, 0) + /,i(0, Q)x + /,2(0, 0)y + \ (/,ii(0, 0)^^ + 2/,i2(0, Q)xy + /,22(0, 0)^^)] 
Then 

= 5(0, 0) = 5,i(0, 0) = 5,2(0, 0) = 5,11(0, 0) = 5,12(0, 0) = 5,22(0, 0) 
By the preceding lemma, we see that 

^^0 as||h|p^O 

I \"\ I 

It is now clear that the Theorem holds, with £(h) := |^|^ and xq = 0. 

H 

Now that we have see the form of a two-dimensional Taylor formula, it is not hard to 
generalize it to higher dimensions: 



Theorem 7.3.10 Suppose that U C is a neighbourhood of xq, and that f : U ^ M.. 
Suppose further that all the partial derivatives f^i, f^ij, f^ijk, ■ ■ ■ up to n**^ order exist in U , 
and are continuous at xq . Then 



m ^ m m 

/(xo + h) =/(xo) + lii^ow + ^ E E f,iji^om^ 

i=l i=l j=l 

TfL fTl fii 

+^EEE/.^i^M™' + --- 

i=i j=i k=i 

+ ^sum up to n*^ order + £(h)||h||"' 



where eih) Q as h — > 0. 



That is, define g(yi) = /(x + xo), so that p(0) = /(xo). Then redefine / to be g. 



Taylor Theorems 



131 



Proof: We just give an outline of the proof. All the important elements are contained in the 
proof of Theorem 7.3.7, and the lemma that precedes it. 
Consider the function 

m 

g{h) :=/(xo + h) - [/(xo) + ^ f^^oW 

i=l 

■ i=l 3=1 

m m m 



■ i=l j=l k=l 
+ ^sum up to v}^ order 



We see that ^(0) and all the partial derivatives g,i{0), g,ij{0), g,ijk{Q), ■ ■ ■ up to n^^ order are 
all equal to 0. 

Now we imitate the proof of Lemma 7.3.8: Fix e > 0, and, by continuity of the n''^ order 
partial derivatives at 0, choose 5 > so that all the n^^ order partial derivatives g,i^...in ^i^ve 

|5,n...jn(h)| < £ when||h||<^ 

Apply the Mean Value Theorem to show that all the (n — l)*'^ derivatives satisfy 

If Ji...jn-i(h)| < C'^||h|| when ||h|| < ^ 

for some constant C. Then apply the Mean Value Theorem again to show that all the (n— 2)*^ 
derivatives satisfy 

l5,fci...fc„_2(h)| < Ce||h||2 when ||h|| < ^ 

for some (different) constant C. Keep going, reducing the order of the partial derivatives step 
by step, until one deduces that 

|5(h)| < (7£||h||" when||h||<^ 

for some constant C. We then see that 

e(h):=^^<C£ when||h||<^ 

V I ||h||n - II II 

Since e was arbitrary, it follows that e(h) — > as h — > 0. By rearranging the definition of 
5f(h), we obtain the desired n^^ order Taylor expansion of / at the point xq. 

H 

Finally, applying Theorem 7.3.10 to the components of a function / : — > M^, we obtain 
the following straightforward extension: 



132 



Taylor Theorems 



Theorem 7.3.11 Suppose that U C R™- is a neighbourhood of xq, and that f = 
{f^,...,f) : U —>■ MP. Suppose further that all the partial derivatives up to order n 
exist in U and are continuous at xq . Then for I = 1, . . . ,m we have 

m ^ m m 

/'(xo + h) =/(xo) + ^ f%^,W + 2! E E 4(^o)^^/^^' 



i=l 

m m m 



=1 3=1 



^ I I L III/ III, 

i=i j=i k=i 

+ -^sum up to n*'* order + £'(h)||h||^ 



where e\h.) —>■ as h — >■ 0. 



□ 



We finish this subsection by briefly revisiting the first- and second order Taylor expansions 
of a function / : — M. We have seen that the derivative, when it exists, is given by the 
Jacobian matrix: 

^xo(/) = ^/(xo) = (/,i(xo) ... /,„(xo) 



The first order Taylor expansion of / about xq is therefore given by 

/(xo + h) = /(Xo) + Jxo(/)h + £(h)||h|| 

Define the Hessian matrix of / at xq by 



^Xo(/) 



//,ll(xo) /,12(X) ... /,i„(xo)\ 
/,2l(xo) /,22(xo) ... /,2n(xo) 

\/,nl(xo) /,n2(xo) ... /,nn(xo)/ 



92/ 


92/ 




dx^x^ 


ay 




dx"x^ 


■ ■ dx"x'^ 



By equality of the mixed partial derivatives, the Hessian is a symmetric matrix. Now note 
that the second-order Taylor expansion of / at xq is given by 

/(xo + h) = /(xo) + J(/)xo(h) + ^h*'-iJxo(/)h + £(h)||h|p 



Remarks 7.3.12 If A is an n x n-matrix, then the map B -.R" xR" ^ 

B(h,k) — h^Mk 

is an example of a bilinear map, i.e. one which is linear in both components: 



defined by 



B(x + y, k) = B(x, k) + B{y, k) B(h, x + y) = B(h, x) + B{h, y) 
S(ax, k) = aS(x, k) B(h, ax) = aB{h, x) 



In particular, the map 



^V(xo) 



(h,k)H^h""//xok 



is a bilinear map, called the second derivative of / at xq. It can be shown that it is, indeed, the derivative of 
the Df, when interpreted in the right way, but we will not pursue this further. We now have 



/(xo + h) = /(xo) + i5/(xo)(h) + -7?V(xo)(h, h) + £(h)||h| 



Maxima and Minima 



133 



In a similar way, the third-order terms can bo summarised by a trilinear map D^/(x()), etc. (but such 
multihnear maps cannot be represented by matrices). In any case, the higher-order Taylor Theorems can then 
be written in the following form: 

/(xo + h) = /(xo) + I)/(xo)(h) + iDV(xo)(h, h) + ■ ■ ■ + lz?"/(xo)(h, . . . , h) + £(h)||h|r 

where e(h) — > as h — > 0, and each D*/(xo) is a ^-multilinear mapping. 

□ 

7.4 Maxima and Minima 

7.4.1 Topological Facts about Extrema 

Definition 7.4.1 Suppose that C CW that / : C ^ M, and that xq G C. If /(xq) > 

/(x) for all X G C, then we say that x is a global maximum of / on C. 

Moreover, if /(xq) > /(x) for all x G C, then we say that xq is a strict global maximum 

of/. 

The notion of a (strict) global minimum is defined in a similar way. 

We begin with some general topological facts about maxima and minima. 

Lemma 7.4.2 If C C. W is compact, then C contains a maximum and a minimum ele- 
ment. 

Exercise 7.4.3 Prove Lemma 7.4.2. 

[Hint: Recall the Heine-Borel Theorem. Show that because C is bounded, supC < oo, and because C is 
closed, sup C £ C] 

□ 



Theorem 7.4.4 (a) Suppose that K CW^ is compact, and that / : M" — M js continuous. 
Then 

f[K] = {f{x) : X G M"} is compact in M 

(b) If f : ^ M is continuous and K C is compact, then f attains its global maximum 
and -minimum on K, i.e. there exist /cminj^max G K such that 

/(fcmin) = mf{/(fc) : G K} /(fc^ax) = sup{/(fc) : k ^ K} 

Proof: (a) Suppose that = {t/^ : i G /} is an open cover of f[K\ in M. We must show that 
there exist finitely many ii, . . . G i" so that f[K] C [/j^ U • • • U Ui^. As the sets Ui are open 
and / is continuous, the sets Vi := f~^^i\ are open in M" — cf. Proposition ??. Now 

xeK 
^f{x) G f[K] 

=^f{x) e\JUi 
iei 

^xef-'[(jUi] 
iei 




134 



Maxima and Minima 



Thus K C [j-^j Vi, i.e. {Vi : i G /} is an open cover of K. 

As K is compact, there are finitely many ii, . . . , G / so that ivT C V^^ U • • • U Vi^ = 
f~^[Ui^ U • • • U UiJ. It is now easy to see that f[K] C [/j^ U • • • U Ui^. 

(b) By Lemma 7.4.2, f[K] has a minimum and a maximum. 

H 

With these general topological facts out of the way, we can begin to study the maxima 
and minima of multivariate functions using calculus. 

7.4.2 Mcixima and Minima via Calculus 

Recall two familiar facts from elementary calculus: 

Fact I. f / : M — > M has a (local) extremum (i.e. maximum or minimum) at xq, and if / is 
differentiable at xo, then /'(xq) = 0. 

Fact II. We can say more if / is twice-diffcrcntiable at xq: If f"{xo) < 0, then xo is a local 
maximum, and if /"(xq) > 0, it is a local minimum. 

We want to generalize these results to functions / : — > M. 

A generalization of the first fact is rather easy to obtain, once we've made some definitions: 

Definition 7.4.5 (a) Suppose that U C M^is open, that f : U —>■ M, and that xq G U.li 
there is a neighbourhood V of xq so that V CU and 

/(xo) > /(x) for all X G F 

then we say that / has a local maximum at xq. 

The notion of a local minimum is defined in a similar way. 

(b) xq is called an extreme point of / iff it is either a local maximum or -minimum. 

(c) Xq is called a critical point of / if / is differentiable at xq and Df{xo) = 0. 
Thus / has a local maximum at xq iff there is a neighbourhood V of xq such that 

/(xo) = max/(x) 

iff Xo is a global maximum of the restriction f\v of f to V . 
Here is the promised generalization of Fact I. 

Theorem 7.4.6 Suppose that U C is an open set, and that f : U ^ 'R is differentiable. 
//xo is an extreme point of f, then xq is a critical point, i.e. Df {jcq) = 0. 

Proof: Suppose that £)/(xo) / 0. Then there is v so that c := D/(xo)(v) > 0. As U is 
an open neighbourhood of xq, there is r > such that the ball -B(xo, r||v||) is a subset of U. 
The map 

g : (-r, r) ^ R : t ^ /(xo + tv) 
is now well-defined, and differentiable: Indeed, for — r < t < r, we have 

g'it) = hm i^^±RlLM = nm /(^o + + /.v) - /(xo + tv) ^ ^^^(^^^^^^ ^ D/(xo+tv)(v) 
ft— »0 tl h—*0 h, 



Maxima and Minima 



135 



which exists, because xq + tv £ U when \t\ < r, and because / is differentiable on U. In 
particular, ^'(O) = Z)v/(xo) = -D/(xo)(v) = c > 0. By single-variable calculus, is neither 
a local maximum or local minimum of on (— r, r). It follows that, for any (5 > 0, there are 
^15*2 € {—S, S) so that 

g{ti) < 5(0) < g{t2) 

from which we see that 

/(xo + hv) < /(xo) < /(xo + t2V) 

Thus there are points Xj = xq + Uv {i = 1, 2) lying arbitrarily close to xq — within 5||v|| for 
arbitrarily small S - for which /(xi) > /(x — 0) and /(X2) < /(xq). Thus xq is neither a 
local maximum or minimum of /. 

H 

Remarks 7.4.7 For functions /;R^ — » R, the above result is easy to interpret graphically: We saw earlier 
that the tangent plane to the surface z = f{x,y) at the point {xo,yo) has equation 

z = f{xo,yo) + dif{xo,yo)ix - xo) + d2f{xo,yo){y - yo) 

(cf. Remarks 7.2.35.) If (xo,2/o) is a critical point of /, then Df{xo,yo) = and so both partial derivatives 
dif{xo, yo), d2f{xo, yo) are zero. Hence the equation of the tangent plane is z = f{xo, yo) = constant, i.e. the 
tangent plane is horizontal. The converse is also easily seen to be true: If the tangent plane to the graph of / 
is horizontal at {xo,yo), then {xo,yo) is a critical point. 

□ 

Having successfully generalized the first fact about maxima and minima from elementary 
calculus, we now proceed with the generalization of Fact II.: If xq is a critical point of / and 
f"{xo) > (resp. < 0), then xq is a local minimum (resp. local maximum). We immediately 
encounter two problems: 

• How do we interpret the second derivative /" if / : — > M? 

We have already made some progress towards this when we introduced the Hessian matrix 
in connection with the second-order Taylor expansion of a function /, i.e. we may interpret 
the second derivative of / at xq as the n x n-matrix 

The second problem is the following: 

• Now that we have an interpretation of the second derivative, how do we meaningfully 
generalize 

/"(xo)<0 or /"(xo)>0 ? 
The next example paves the way: 

Example 7.4.8 Suppose that / : K ^ K is twice differentiable, that f'{xo) = and that /"(xo) > 0. The 
second-order Taylor expansion of / about xo is then 

/(xo + h) = /(xo) + 0-h+ ^/"(xo)ft' + e(/i)/i' 



136 



Maxima and Minima 



where, as usual, £{h) ^ as /i — > 0. In particular, there is 5 > such that |e(/i)l < ||/"(a;o)l whenever \h\ < 5. 
Since f"{xo) is positive, we see that — |/"(a;o) < £{h) < |/(a:o), and thus that f' (xo) > |/"(a:o) + £{h) > 0. 
Thus, for sufficiently small h, we have 

fixo + h)-f{xo) = {y'{xo)+e{h))h''>0 i.e. /(xo + 7i) > /(xo) 

and thus xo is a local minimum of /. 

Now suppose we attempt to copy this argument. Suppose that / : R" ^ R is differentiable, and that xo 
is a critical point of /. The second-order Taylor expansion of / about xo is given by: 

/(xo+h) = /(xo) + ih*'-H.,(/)h + £(h)||h||^ 

where e(h) ^ as h ^ 0. 

Now because of this, we ought to have |e(h)| • ||h|p < ||h*'"_ffxo (/)h|, for sufficiently h sufficiently close to 
0. (We'll explain this later, in the proof of Theorem 7.4.12.) Thus, for xo to be a local minimum, it is enough 
to insist that 

h*'-//x„(/)h>0 

holds for all h sufficiently close to 0. However, it is rather obvious that is it hold for all sufficiently small h, it 
will hold for all h: By choosing A sufficiently small, we can make Ah as close to as we like, and 

{Xhf^ Hxo(Xh) = (li^ Hxo{f )hj has the same sign as h*'^_ffxo(/)h 

□ 



Definition 7.4.9 An n x n-matrix A is said to be positive semi-definite if and only if 
h*^^h > for all h G M". A is said to be positive definite if and only if h*''^h > for all 
h / in M". 

The concept of negative definite and negative semi-definite matrices are defined in a similar 
way, by reversing the inequality signs. 



Example 7.4.10 In statistics, symmetric positive (semi-)definite matrices occur naturally: Every covari- 
ance matrix is positive semi definite. To see this, suppose that Xi, . . . , Xn are random variables, and that E 
is their covariance matrix, i.e. 

J^ij = Cov{Xi,Xj) = E[{Xi - EXi){Xj - EX,-)] 

Note that the covariance matrix of the random variables Xi, . . . , Xn is the same as the covariance matrix of 
the centered random variables Xi — EJf i , . . . , Xn — EX„ . We may therefore assume without loss of generality 
that Xi, . . . , Xn are centered (i.e. that EX^ = for i = 1, . . . , n), and thus that E^j = E[XiXj]. 

Now observe that if a = (ai, . . . , a„)*'' G R", then Y = sl^'K. = aiXi + • • • + UnXn is a centered random 
variable with variance 



Var(y) = E[y^] = E 



^ ^ ^ ^ didjXiXj 
.1=1 3=1 



As variances are non-negative, we see that 

a*''Sa > for all a e R" 
and thus that E is positive semi-definite. 

□ 



Exercise 7.4.11 Suppose that A is an n x n-matrix. Show that 

{{x,y)) ■■= (x,Ay) 

defines an inner product {{•, •)) on R" if and only if A is symmetric and (strictly) positive definite. (Here {x, y) 
denotes the standard Euclidean inner product, i.e {x,y} := x^^y.) 



□ 



Maxima and Minima 



137 



Keeping in mind Example 7.4.8, the following proposition cannot be a surprise: 

Theorem 7.4.12 Suppose that U QM." is an open set, and that f : U ^ M". IfDf{y:o) = 

and D'^fixo) := H^^^{f) is strictly positive (resp. negative) definite, then f has a local 
minimum (resp. maximum) atxQ. 

To prove it, we need a lemma: 

Lemma 7.4.13 Suppose that A is a (strictly) positive or -negative definite n x n-matrix. 
Then there is a C > so that 

C||x||2 < |x*Mx| for all x| G 

Proof: Suppose that A is positive definite, so that |x*^Ax| = x*''Ax Define : M" — M by 

^(x) = x*^^x 

It is clear that g is continuous. The set ivT = {k G M" : ||k|| = 1} is compact in M". By 
Theorem 7.4.4, g attains its minimum on K, i.e. there is kmin G K so that Hkminll = 1 and 
so that 

5(kmin) < 5(k) for all k € 

Since ^ i^, we see that kmin 7^ 0, and so that C := ^(kinin) > (because i?xo(/) is strictly 
positive definite). 

Now if 7^ X e M", then x := ||||| G i^. So ^(x) = x*Mx = ||x||2(i*Mi) > ||x||2C. 

H 

Proof of Theorem 7.4.12: The argument provided in Example 7.4.8 is adequate. The only 
thing we need to verify is that |£(h)| • ||h|p < ^h*''i/xo(/)h, for sufficiently h sufficiently close 
to 0. 

By the preceding lemma, there is a constant C > so that 

x*''/yxo(/)x > C||x||2 forallxeM" 

Since £(h) ^ as h — > 0, we can find a (5 > so that |£(h)| < whenever ||h|| < 6. It 
follows that 

|£(h)| • ||h||2 < C||h||2 < ^h*''lfxo(/)h whenever ||h|| < 5 

H 

We have now succeeded in generalizing Fact II: /"(xq) becomes Hxg{f), and 

/"(xq) > becomes H^oif) is positive definite 

It remains to present some results which allow one to recognize whether or not a symmetric 
matrix A is positive- or negative definite. Recall the following results from linear algebra: 

Theorem 7 AAA (Spectral Theorem) An n x n-matrix A is symmetric if and only ifW^ 
has an orthonormal basis of eigenvectors of A. 
In particular, A is diagonalizable. 



138 



Maxima and Minima 



Corollary 7.4.15 An n x n-m,atrix A is symmetric and positive definite (resp. semi- 
definite) if and only if all the eigenvalues of A are strictly positive (resp. non-negative). 
Similarly, an n x n-matrix A is symmetric and negative definite (resp. semi-definite) if 
and only if all the eigenvalues of A are strictly negative (resp. non-positive). 



Proof: Since A is symmetric, there exists an orthonormal basis {vj : i = l,...,n} of 
eigenvectors of A. Let Aj be the corresponding eigenvalues, i.e. Avi = XiVi. Then 

vfAwj = Xivfvj = Xidij 

It follows that a € M", then a = J27=i ^^^i ^'-'^ some a* G M, and so 

n n n 

a*Ma = a'a^yrfAyrj) = ^ a'a^XjSij = ^ia'fXi 

i,j=l i,j=i i=l 

In particular, if A is positive definite, then < vf'Avi = Aj, i.e. all the eigenvalues of A 
are strictly positive. Similarly, if A is positive semi-definite, then < vf'Avi = Xi, so that 
the eigenvalues are non-negative. 

Conversely, suppose that all the eigenvalues Aj are strictly positive. If a := jyi=i (^^^i 0) 
then Q* 7^ for some i, and a*''^a = Y17=i('^^)'^^i ^ ^- Hence A is positive definite. Similarly, 
if all the eigenvalues are merely non-negative, then a^^Aa. > 0, and so A is positive semi- 
definite. 

The proof for the negative (semi-) definite case is similar. 

H 

Here is a slight strengthening of Theorem 7.4.12: 



Theorem 7.4.16 Suppose thai Df(xo) = and that -f^xo(/) non-singular. Then f has 
a local minimum (resp. maximum) at xq if and only if H^^i^f) is strictly positive (resp. 
negative) definite. 



Exercise 7.4.17 Prove Theorem 7.4.16. 

[Hints: Note that the (=>)-direction of the Theorem is just Theorem 7.4.12. We therefore need only prove the 
(<^=)-direction. Suppose therefore that H := H:^„{f) is non-singular, and that / has a local minimum at xq. 
Using a relationship between the determinant of a matrix and its eigenvalues, explain why all the eigenvalues 
of H are non-zero. Now suppose that one of the eigenvalues Ai (belonging to an eigenvector v,) is strictly 
negative. To show that xo is not a local minimum of /, it suffices to show that /(xo + tvi) < /(xo) for all 
sufficiently small t. Apply Lemma 7.4.13 to conclude this result — the 1 x l-"matrix" Ai is negative definite.] 

□ 

Remarks 7.4.18 If / has a Hessian -ffxo(/) B^t a critical point xq which has both positive and negative 
eigenvalues, then / may have neither a maximum nor a minimum at xq. In that case, xo is called a saddle 
point. 



□ 



Maxima and Minima 



139 



7.4.3 Lineeir Leeist Squaires 

It is well-known that a parabola is completely determined by specifying three of its points, 
i.e. if one knows three points {t2,y2), (^3,2/3) lying on a parabola y = at'^ + bt + c, then 

one can find the parabola: Just solve 




for a, b, c. Assuming none of the i^'s are the same, the 3 x 3-Vandermonde matrix is invertible, 
so the solution is unique. 

If one observed a parabola in nature, however , e.g. the path of a comet, it would not be 
a good idea to determine the parabola by measuring just three points on its path, because of 
measurement error. It is better to observe the comet at a large number of times, and then 
to obtain the a, b, c which best fit the data. (Gauss developed the method of least squares 
precsiely for the problem of determining orbits of celestial bodies.) 

In general, given functions . . . , (f)n{t) and observations (ti, yi), . . . , [tm, ym)i we may 

try to find coefficients xi, . . . , a;„ which best fit the data, i.e. so that 

n 

Vj ^ XI Xi(j)i{tj) for j = 1, . . . , m 

i=l 

For the problem described above, we have ^^(t) := f {i = 0,1,2), but we can take more 
general functions. 

The word best, of course, needs further elucidation: For each x := (xi, . . . , Xm) we obtain 
numbers yj(x) := J2i=i Xi4>i{tj). We want that yj « (x) for all j = 1, . . . , m, i.e. we want 
to determine that n-tuple x for which the combined error is as small as possible. We therefore 
want the distance between the vectors y := (yi, . . . , ym) and y*(x) := (y*, . . . , y^) to be as 
small as possible, i.e. we want to find that value x for which 

l|y-y*(x)|| 

is a minimum, where || ■ || denotes the usual Eucidean norm on W". Because of the structure 
of this norm (square root of sum of squares) it is slightly more convenient to minimize ||y — 
l/*(x)|p instead, though this will of course amount to the same thing (as x 1— > x^ is strictly 
increasing on the positive real line). Also note that 



2/*(x) 



/Mti) ■■■ Mti)\ (xi 



\(t)l{tm) ■■■ (pnitm)/ \Xn/ 



--: Ax 



We have here a function / : ^ R : x ||y ~ ^x|p whose minimum we want to find. 
Now the Euclidean norm is determined by the usual inner product, ||z|p = (z,z). Thus 

/(x) = (y,y)-2(y,Ax) + (Ax,^x) 

To determine the minimum, we first solve -D/(x) = 0, using Leibniz' Rule: 



Df{x) = -2(y, DA{x)) + 2{DA{x),DA{x)) 



140 



Maxima and Minima 



which yields 

(DA(x),^x-y) =0 i.e. (i:)^(x)(h), Ax - y) = for all h G 

Now A : M'' is clearly Hnear, so DA{x.) = A, for all x. Thus 

{Ah, Ax - y) = for all h G 

Noting that (a, b) = a*''b, we see that h*''A*^(Ax - y) = for all h, and thus that A*'^{Ax - 
y) = which yields the normal equations 

A^''Ax = A^y 

which can (hopefully) be solved for x, as A*''^'A is a square symmetric matrix. 
A simple calculation (left as an exercise) shows that 

H^{f) = A^^A for all x 

Hence if A!^"^ A is positive definite, then the solution of Df{-x.) must yield a minimum of /. It 

is easy to see that A^'^'A is always at least positive semi-definite, as x^^A^^Ax = {Ax, Ax) = 
I \Ax\ p > for all x. Clearly if Ax ^ whenever x 7^ 0, then A'''' A is strictly positive definite, 
and this will occur precisely when A is one-to-one, i.e. when the column vectors of A are 
linearly independent. 

Now let's go back a bit, to see what is going on geometrically: We want to find an x 
for which the quantity ||y — Ax\\ is a minimum. Let V be the range (or image) of A, i.e. 
y := {z G M™ : 3x G M"(z = Ax)}. Clearly F is a linear subspace of M™, and is the space 
spanned by the column vectors of the matrix A. To say that | |y — Ax\ \ is a minimum is to 
say that Ax is the point in V which lies closest to y. Thus we are seeking the orthogonal 
projection of y onto the subspace V. 

It will become clear that all we need to solve this best approximation problem is the 
existence of orthogonal projection, and this exists in any Hilbert space. These ideas will be 
encountere agaiin when we look at Fourier analysis, and also when we discuss conditional 
expectation. 



Chapter 8 

Products and Independence 



8.1 The Monotone Class Theorem 

We have already seen that a-algebras can be quite comphcated to deal with, and that in 
many cases, it is easier to work with simpler classes of sets. We therefore "broke up" the 
notion of a algbcra into two parts, that of ir-system and \— system. For completeness, we will 
recall all the definitions and results below, but what you need to know is the following: 

• A system A of sets is a cr-algebra iff it is both a 7r-system and a A-system. 

• The notion of A-system meshes very nicely with the continuity properties of measure. 

• Therefore to prove that something is true for all events of a cr-algebra it often suffices 
to show that it is true for the events in a vr-systcm that generates the cr-algcbra. The 
events in such a 7r-system can often be very simple (e.g. they may be just intervals, or 
"rectangles.") 



Definition 8.1.1 Let C be a collection of subsets of 

(a) C is called a Tr^ystem if it is closed under finite intersections. 

(b) C is called a A-system if 

(i) $7 G C; 

(ii) ^, B G C and A C 5 imphes B - A e C\ 

(iii) If Ai, ^2, • • • e C and An ] A, then AeC. 

(c) We denote by vr(C) and A(C) the tt-, respectively, A-system gener- 
ated by C, i.e. the smallest tt-, respectively, A-system on which 
contains C. 



[Why do 7r(C), A(C) always exist?] 

Proposition 8.1.2 A family C of subsets of ft is a a-algebra iff it is both a ir-system and 
a X-system. 

141 



142 



Monotone Class Theorem 



Proof: It is clear that a cj-algebra is also a vr-system and a A-system. 

Conversely, suppose C is both a vr- and a A-system. Then C is closed under complemen- 
tation, by (i), (ii) of Defn. 8.1.1(b). De Morgan's Laws applied to Defn. 8.1.1(a) show that 
C is closed under finite unions. Finally, given Ai, A2, ■ ■ ■ E C, let A = An- Define 

— 

Then each Bn G C, and Bn T ^- Hence ^ G C, by Defn. 8.1.1(b)(iii), and thus C is closed 
under countable unions. 

The following technical result often allows us to work with "easy" 7r-systems, instead of 
the "difficult" cr-algebras: 

Theorem 8.1.3 (Dynkin's Lemma, Monotone Class Theorem) 

(a) If C is a Tr-system on Q, then 

m = ct{c) 

(h) Suppose that C is a -system and that V is a X-system (both on a 
set Q,), and also that C C.V. Then cr{C) C V. 

Proof: (a) Let V = A(C). By Propn. 8.1.2, it suffices to show that P is a 7r-system. We do 

this in two steps. 

STEP I: Fix C G C, and define 

Vc = {Aev : Anc ev} 

Then C C Vq C V (because C is a vr-system). We now show that Vc = T). To that end, it 
suffices to show that Vq is a A-system (because then T>c is a A-system containing C, and P 
is the smallest s\i(^) . We therefore verify (i)-(iii) of Defn. 8.1.1: 
(i) is obvious. 

If A, 5 G Pc and ^ C 5, then {B - A) f^C = {B r\ C) - [A f^C). But B DC, A f^C eV hj 
definition of Vc, and thus {B — A)r\C ^V, because P is a A-system. Thus {B — A) ^ Vc- 
Finally, if ^1, ^2, • • • G Pc and An T A, then Ai fl C, ^2 fl C, • • • G P and (A„ n C) j ^ n C. 
Hence Ar\C eV, and so vl G Pc- 

We now know that Pc = P for every C e C. 
STEP II: Now, fix any G P, and define 

v^ = {Agv : Ann eV} 

First note that if C G C, then Vc = V, so D £ Vc- It follows that i:> n C G P, and thus that 
C G P^, for every C G C. Thus C C P^, for aU D G P. 

It follows as above that P^ is a A-system, and thus that P^ = P, for all D eV. 

In particular, if ^, i? G P, then A G P^, and so Af] B e V. This shows that P is a 

TT-system, and thus a tj-algebra. 

(b) follows directly from (a). (Why?) 



Products and Independence 



143 



H 

An important corollary of the preceding theorem is the following: Two probability mea- 
sures which agree on a tt system agree also on thc^ a algc^l^ra generalxxl l)y (hat vr-systrem. 

Corollary 8.1.4 Suppose that iJ,,u are probability measures on a space 
{0,,J^), and that A is a ir-system which generates J- . Ij ii-iV agree on 
A, then they agree on T , i.e. \i = v, 

Proof: (Outline) Use the continuity of measure to show that the set P := {F G J- : [iF = i/F} 
is a A-system which contains A. Then deduce that V C a{A) = T . 

H 

The full power of the preceding technical results will now become apparent: 
Theorem 8.1.5 (Monotone Class Theorem) 

Let TC be a set of bounded functions from a set ft into M satisfying the 
following conditions: 

(i) H is a vector space. 

(a) The constant function 1 belongs to H. 

(Hi) Given any sequence hn of non-negative elements of TL such that 
hn ^ h, if h is bounded, then h eH. 

Let A be a ir-system on O with the property that Ia & for every 
AeA. 

Then every bounded a (A) -measurable function belongs to H. 

Proof: Let T> = {F C 0, : Ip £ 7i}. It is not hard to show that P is a A-system. By Dynkin's 
Lemma (Thm 8.1.3), V D a{A). 

Let ^ be a non-negative, bounded cr(^)-measurable function, with upper bound K, i.e. 

< h{uj) < K for sllueVl 

Let hn.,n G N be a sequence of simple (j(yl)-measurable functions such that /i„ j h. (Recall 

how this is done: If we define/i„(a;) := 2~"'[2"/i] := ^ ^^-fA(n,fe)('^) where [x] denotes the 

fe=i 

largest integer < x and A{n.,k) := {cj G J7 : < h{uj) < then the /i„ are simple 

functions with hn T h. Since h is (7(^)-measurable, each A{n, k) G V, i.e. lA{n,k) ^ ^0 
Because K is a vector space, we now see that hn for each n G N. Thus h eH as well. 

We have now shown that every non-negative bounded (T(^)-measurable function belongs 
to Ti.. The same result can be obtained for arbitrary bounded h by splitting into positive and 
negative parts: h = h'^ — h~. 

H 

The Monotone Class Theorem can be used in the same way as the "standard machine" , 
and is often more powerful. (Indeed, there are even more powerful monotone class theorems 
available, but they are not cheap, and we will not need them.) 



144 



Products 



8.2 Products 
8.2.1 Introduction 

Example 8.2.1 (a) Denote by fiA the area of a subset A of R^. We know how to define fi on rectangles, 
i.e. sets of the form A = Bi x B2, where Si, S2 are intervals in R: Indeed 

M = ABi X XB2 (*) 

where A is Lebesgue measure. So fi is to be a measure on (R^B(R2)) such that x B2) = A(Bi)A(B2). 
Of course, many sets in B(R^) do not have the form Bi x B2, and we would like /j to be defined for them 
as well. So (*) cannot serve as a (Icfinition of /i. 

(b) In probability theory, it is quite natural to consider the product of two probability spaces. Such products 
typically model sequences of independent experiments. For example, let fii = {H, T},Ti = V{0,\) and let 
fi{H} = \= Pi{T}. Then (fii,^i,Pi) models the tossing of a fair coin. Now let ^2 = {1,2, . . . ,6},J^2 = 
V{9.2) and P2{1} = P2{2} = • • • = P2{6} = Then {0.2,^^2,^2) models the rolling of a fair die. The 
underlying set of the probability space which models the combined random experiment "First toss a fair 
coin, and then roll a fair die" can clearly be taken to be the cartesian product ft = fii x ^2. The natural 
(T-algebra will be ^ = P(f2i x ^2), and it is not hard to see that this cr-algebra is generated by the 
TT-systcm {Bi x B2 ■ B\ £ J-\,B2 £ .F2}. Now the event B\ x B2 ^ O. consists of all those outcomes 
a; = (a;i,W2) £ Oi x 0,2 having u)i £ Bi and u)2 £ B2. Thus Bi x B2 occurs in the combined random 
experiment iff Si and B2 occur in each of the individual experiments. 

The probability measure associated with the combined random experiment would therefore naturally 

satisfy 

P(Sl X B2) = Pl(Si)P2(S2) (**) 

But not every event in V{0\ x O2) is of the form Si x S2, so (**) cannot serve as a definition of P. 

□ 

The aim of this section is to construct, out of two measure spaces {S,S, ^),{T,T,i') an 
new measure space {SxT,S®T,^®u) satisfying the following requirements: 

(i) A subset of 5 x T is called a measurable rectangle if it has the form A x where 

S ®T \s defined to be the smallest cr-algebra on S" x T which has all rectangles with 
measurable sides as members. 

(ii) For each rectangle Ax B, we require that i'){A x B) = ^A ■ uB 

Remarks 8.2.2 (a) A remark on notation: We will be working with functions of more than one variable, 
and may integrate with respect to just one of those variables. We therefore introduce the following 
notation: 

J f{x) l^(dx) ■- fif =: n^'fix) 

Thus, for example, J f{x,y) n{dx) integrates the function f{x,y) over x, keeping y fixed. The integral 
JJ f{x,y) fi{dx) vi^dy) is a double integral that first integrates / w.r.t. /i over the variable x, and then 
integrates the function J/ 1— > / f{x, y) iJb(dx) w.r.t v over the variable y. We may also write this as 
v^{n-f{x,y)). 

(b) Several times below, we will prove a result for finite measures, and then refer to a "standard argument" 
to lift the result to a-finite measures. This is done as follows: Suppose that n is cr-finite on {S,S), and 
that a result 3> has been proved to hold for finite measures. Since fj, is cr-finite, there exists a sequence of 
measurable sets A„ f S such that /i^„ < 00 for all n £ N. The measures := I An " M finite on {S,S), 
so that result 11 holds for the By the MCT, if / £ m5+, then 

f^f = ^(lim//A„) = limyUn/ 

n n 

This is often enough to show that <i> holds for jj, as well. 



□ 



Products and Independence 



145 



8.2.2 Products of Measure Spaces 

Given two measurable spaces {S,S), (T,T), we can construct a cr-algebra S T on the 
cartesian product S' x T, as follows: Define projections tts : S x T —>■ S , ttt '■ S x T T hy 

ITS '■ {s, t) 1-^ S TTt '■ (s, t) 1-^ t 

The interpretation is as follows: (s, t) denotes a sample point in a space of "combined" 
outcomes: i.e. s E S occurred and t E T occurred. Given such a combined outcome 
Lo = {s,t), TTsico) = s measures which outcome occurred in S, and irriio) = t measures which 
outcome occurred in T. Given that we know a combined outcome u = {s,t), we should 
also know the component outcomes s and t. Thus the projection mappings ttsjTTt should be 
measurable. The product cr-algebra 5 (g) T is defined to be the smallest cr-algebra on S x T 
which makes these maps measurable. To recapitulate: 



Definition 8.2.3 Let {S,S) and {T,T) be measurable spaces. Define 
projections its : S x T ^ S , ttt ■ S x T ^ T hy 

ITS '■ {s, t) 1-^ s TTt : (s, t) I— > i 

Then define S ®T := a{irs,T^T) to be the smallest cr-algebra for which 
both projections are measurable. 

Exercise 8.2.4 Let {S,S) and (T,T) be measurable spaces, and let TZ := {A x B : A e S, B e T} he the 
set of all measurable rectangles. Note that 7?. is a 7r-system. Show that <S T = a (TV). 
Hence the product cr-algebra is generated by the 7r-system of all measurable rectangles. 
[Hint: Ax B = {AxT)n{S X B), and AxT = ng^[A].] 

□ 

Exercise 8.2.5 Show that B(R^) = B(M) 8 B(R). 

[Hint: Using Exercise 16.3.4, it is easy to sec that 6(K^) I) S(K) ® Z?(K). For the opposite direction, show 
that any open set in can be written as a countable union of sets of the form U x V, where U, V are open 
intervals in R.] 

□ 



Suppose that (S,S,fi) and {T,T,v) are measure spaces. We would like to construct a 
measure fi® u on {S x T,S One way that suggests itself is to define 

(1) ifi ®u)B:= J (^J leis, t) z/(di)) nids) = /i'(i^*/B(s, t)) BeS^T 
Another is to define it as 

(2) {iJi®u)B := j i^j lB{s,t) ix{ds)^ v{dt) = u\ii'lB{s,t)) BeS®T 

Exercise 8.2.6 Check that 

{lJL®v){Ax B) = plA-uB AeS,BeT 
for both of the above possible definitions of (8> z/. 



146 



Products 



□ 

We shall soon see that (i) the above definitions are both possible, and (ii) they coincide. 

We first investigate the possibility of defining /v, in the above manner. To be able to 
do perform a double integral JJ f{s,t) u{dt) n(ds) it is necessary that: 

(i) for each s £ S, the map t f{s,t) must 7"-measurable, so that we can calculate the 
inner integral J f{s,t) i'(dt); 

(ii) the map s ^ F{s) := J f{s,t) iy{dt) must be 5-measurable, so that we can calculate 
the outer integral / F{s) ix{ds). 

The following lemma gives us what we need: 

Lemma 8.2.7 Suppose that {S,S) and (T,T) are measurable spaces, that ji is a a-finite 
measure on (S, S), and that f : S x T ^ is S T -measurable. Then 

(i) For each t £T, the map s f{s,t) is S-measurable. 

(ii) The map j f{s,t) fi{ds) is T -measurable. 

Proof: We apply the Monotone Class Theorem (Thm. 8.1.5. First assume that /x is a finite 
measure, and let 

H = {f E mS (g) T : / is bounded and satisfies (i) and (ii)} 

It is easy to verify that 7^ is a vector space (we need the finiteness of /x in order to avoid 

expressions of the form oo — oo), and that that each IaxB £ 'H, where A £ S , B £ T . By 
the MCT, H is closed under bounded limits of increasing non-negative sequences. Moreover, 
the set TZ := {A X B : A e S,B e T} is a 7r-system with the property that Ir £ H for 
every R e TZ, and thus by Thm. 8.1.5 every bounded S (g) T-measurable function belongs 
to Ti. (since cr(JZ) = 5 T). Now each non-negative measurable function / is the limit of 
bounded non-negative measurable functions (/ = limn / A n), and thus another application 
of the MCT shows that every / G m(5 ® T)+ satisfies (i) and (ii). 

Now drop the assumption that /j, is a finite measure. Because /x is cr-finite, we can 
choose An t S such that ^An < oo. The measures Hn = I An ' l^ ^-^^ finite measures, and 
thus each map t ^ J f{s,t)iin{ds) is T-measurable (where / > 0). Since J f{s,t) jx^ds) = 
lim„ / f{s,t) iJ,n{ds), the MCT implies that the result holds for /j,. 

H 

We now know that it is possible to define (g) in the ways indicated. What we don't 
(yet) know is that these constructions define a measure, and that they coincide. 
For definiteness, we fix one of the above definitions: 

Definition 8.2.8 Suppose that {S, S, /i) and (T, T, u) are cr-finite mea- 
sure spaces. Define a map ji^u-.S^T^ by 

{^l (g v)B := JJ Isis, t) v{dt) n{ds) = ii''{vHb{s, t)) B £S®T 

V is called the product measure of /x, v. 

□ 



Products and Independence 



147 



Exercise 8.2.9 Show that (g> z/ defines a u-flnite measure on (5 x T, <S (g> T). 

□ 

The next two results show that (modulo certain conditions) we can calculate the integral 
w.r.t. jjL® V diS Si, double integral, and the order of integration doesn't matter: 

j fd{ix®v) = J J f{s,t) u{dt) fi{ds) = jj f{s,t) fi{ds) v{dt) 

We first show this for non-negative measurable functions: 



Theorem 8.2.10 (Tonelli) 

Suppose that {S,S,iS) and {T,T,u) are a-finite measure spaces. If f & 
m{S<^T)+, then 

{li(^u)f = li'{u'f{s, t)) = u\li'f{s, t)) (*) 



Proof: We use the Monotone Class Theorem (Thm. 8.1.5). First assume that jjl, v are finite 
measures. The result is obvious ii j = IaxB, where A x B measurable rectangle, (or cf. 
Exercise 16.3.6). The class 

H = {f E m{S (g) T) : / is bounded and satisfies (*)} 

is easily seen to satisfy the requirements of Thm. 8.1.3 , and thus implies that TC contains 
every bounded S (g) T-measurable function. The result for arbitrary non-negative / follows 
by MCT. 

A standard argument lifts the result to the case where /x, v are merely a-finite. 

H 

As a by-product, we obtain the result that our two possible definitions of as iterated 
integrals coincide: If B E S <SiT, then Ib is a non-negative measurable function, and we may 
apply Tonelli 's Thm. 

For non-negative functions /, the integral /if always makes sense, but we may have 
/x/ = oo. For arbitrary measurable /, we have to be more careful: 



Theorem 8.2.11 (Fubini) 

Suppose that (5, S, fi) and (T, T, v) are a-finite measure spaces. If f & 
C^{S xT.S ®T,ij,®u), then 

{,,(^u)f = ,i%u'f{s,t)) = u\,,'f{s,t)) 

Here the map t i— > fj,^f(s,t) belongs to C^{T,T,u) for i/-a.e. t ^ T. 
Similarly, the map s i— > i/*/(s, t) belongs to C^{S,S,fj,) for /x-a.e. s & S. 



Proof: The result holds for |/|, by ToneUi's Thm., and hence Ns = {s e S : z^*|/(s, t)\ = +00} 
is fi-mill, and Nt = {t e T : ij,^\f{s,t)\ = +00} is jv-null. Redefine f{s,t) to be zero when 
either s G Ns or t € Nt; this won't affect the integral of /, by Thm. 6.3.9. The result follows 
by splitting / into positive and negative parts. 



148 



Independence 



H 

Ftemarks 8.2.12 (a) Fubini's Theorem allows the interchange of the order of integration, provided the 
integrand is integrable w.tr.t the product measure. It follows from Fubini's Theorem that 



provided that f £ . See Exercise 16.3.13 for what can happen ii f ^ C^. 

(b) Fubini's Theorem also easily extends to arbitrary finite products: If {Si, Si, fii) are cr-finite measure spaces 
for i = 1, . . . , n, then 

(i) >Si (8) ■ ■ ■ ® Sn is the a-algebra on 5*1 x • • • x S'„ which is generated by the projections iVi : Si x 
• • • X Sn — > Si : (si,...,s„) i—> Si. It is also generated by the family of measurable "rectangles" 
TZ = {Ai X ■ ■ ■ X An ■■ Ai € Si ioT i = 1, . . . ,n}. 

(ii) /xi ® • • • ® /in is the unique measure on <Si (g) • • • (g) <Sn which assigns to every rectangle the measure 

(//I (g> . . . X IJ.n){Al X ■■■ X A„) = UlAl UnAn 

(iii) Fubini's Theorem states that if / : 5i x • • • x Sn — » R is /ii 8 • • • (g) /in-integrable, then 



/ / (g) • • • (g) Mn) = / ( / • • • ( / / ) . . . dn2 ) dix 

and that any interchange of the order of integration is permissible. 



Exercise 8.2.13 Let ^ ^ 

Show that 

f{x,y) \{dv) X{dx) = j f{x,y) \{dx) \{dy) = -j 

What can you conclude about 

/ f d{\® A) 

V[0,1]X[0,1] 



□ 



□ 



8.3 Independence 

We return now to the notion of independence. Let (O,^, P) be a probability space, Recall 
that we have made the following definitions: 

• Events Fi, . . . , F„ G are independent iff P(Fi n • • • n F„) = HLi I^l^fc)- 

• Sub-(T-algebras Qi, . . . ,Qn are independent iff whenever Gfe G ^ for A; = 1 , . . . , n then 
Gi, . . . , Gn are independent events. 

• A random variable X is independent of a cr-algebra Q iff a{X),Q are independent cr- 
algebras. 

Other variations (e.g. the what it means for random variables Xi, . . . , X„ to be independent) 
should be obvious. 

If two TT-systems are independent, so are the a-algebras generated by those 7r-systems: 



Products and Independence 



149 



Theorem 8.3.1 Let {Ct}teT a collection of independent ir-systems 
on {fl,J^,F). Then {a{Ct)}teT is a collection of independent a-algebras. 

Proof: We must show that if ti, . . . G T are distinct, then cr(Ct^), . . . ,a(Ct„) are inde- 
pendent. We proceed by recursion. Fix ti,...,tn € T, define J-'t^. := (T(Ct^), and also fix 
Ct2 e , . . . , Ct„ G Ct„ . Let 

V:={Fe Tt^ : P(F n Ct, n • • • n CtJ = PF • PCt, • ... • PCtJ 

By assumption, Ct^ C V. Using the continuity of measure, it is straightforward to check that 
P is a A-system. Thus by Thm. 8.1.3 wc have T) = Tt^_ for every selection of Ct^. € Cj^. 
k = 2,3, and hence the families Ft^^Ct2-,Ct^, ■ ■ ■ ,Ct^ are independent. Repeat: Fix 

G and Cjg eCt^,...Ct^ G Ct^. Redefine 

P:={FGJ^ta :P(Ft,nFnCt,n---nCtJ = PFt,-PF-PCt,- ... -PCtJ 

Again, P is a A-system containing C^j, and hence by Thm. 8.1.3 V = Tt^- From this it 
follows that .Fj^ , .Ft2 , , . . . ,Ct„ arc independent. Repeat the construction n — 2 more times 
to deduce that JT^^, . . .J^tn are independent. 

H 

If you're familiar with the elementary definition of independence for random variables, 
you will want to know the following: 

Exercise 8.3.2 Random variables X,Y are said to be independent in the elementary sense iff 

W{X <x,Y <y)= P(X < a;) • P(y < y) all a;, y £ R 

Prove that random variables are independent iff they are independent in the elementary sense. 
[Hint: First show that set X := ^{X < x} : x € r| is a 7r-system which generates cr{X). 

□ 

If {Gt}teT is a family of sub-cr-algebras of J^, define 

Vft-KUe.) 

teT teT 

Proposition 8.3.3 Suppose that {Qt}teT is a family of independent a-algebras, and that V is 
a partition ofT. For P G V, define Qp := \J^^pQt- Then {Qp}peV is a family of independent 
a-algebras. 

Proof: For P eV, define Cp to be the set of all finite intersections of members of IJtep^t- 
Then each Cp is a vr-system, and a{Cp) = Qp. The independence of the Qf, t E T, is easily 
seen to imply the independence of the Cp, P G V, and thus the independence of the Qp, 
Per {hj Propn. 8.3.1). 

H 

The following result is now easy to within the measure-theoretic framework, but very 
difficult to prove outside it: 



150 



Independence 



Theorem 8.3.4 Suppose that Xi, . . . , Xn+m o-tg independent random 
variables, and that ^ M and M™ M are Borel functions. Then 
Y = f{Xi, . . . , Xn) and Z = g{Xn+i, ■ ■ ■ , Xn+m) o-i"^ independent. 

Exercise 8.3.5 Prove Thm. 8.3.4. 

□ 

Earlier in this course, we proved the following result using a "standard machine" approach: 

Theorem 8.3.6 Suppose that X,Y £ P) are independent ran- 

dom variables. Then XY G C^{^,J^,F) also, and 

EXY = EX -EY i.e. Coy{X, Y) = 

The same result holds in the extended sense if X,Y > 0. 

Actually, there is an easier proof of Propn. 8.3.4, if we adopt another point of view: If X, Y 
are random variables, then (X, Y) is a random vector, i.e. a map 

Its distribution is a probability measure on (M^,;B(M^)) given by iix,y{B) = P{(X, y) G B}, 
where B G H(M^). If nxilJ^Y are the distributions of X,Y respectively, then the product 
measure ® fJ-Y is another probability measure on [M.'^ , B{M?)) . It turns out that X,Y are 
independent iff /xx,y = A*JV ® Hy- 

Theorem 8.3.7 Let X,Y be random variables on {^,J^,F). Let 
fJ'X,Y, fJ-x, fJ'Y be the distributions of the random elements {X,Y),X and 
Y. Then X, Y are independent iff ij,x,y = ® IJ^Y- 

Proof: Suppose that X,Y are independent. If ^ x B is a measurable rectangle in H(M^) = 
i3(M) (8)i3(M), then 

Hx,y{A xB) = F{X eA,YeB)= F{X e A)F{Y e B) = nxA ■ hyB = {ixx ® iJLY){A x B) 

Hence ij,x,y,I^x ^ I^y agree on a 7r-system that generates i3(M^) (the family of measurable 
rectangles). Since /xx,Y and /jLx <8) Hy agree on a 7r-system that generates H(M^), they are 
equal: = ^ix® l^Y- 

Conversely, if ijlx,y = ^^x ® A*y , then if x, y G M, we have 

F{X <x,Y<y) = /xx,y((-oo, x] x (-oo, y]) = /xx(-oo, x] ■ /xy (-oo, y] = F{X < x)F{Y < y) 

Hence X,Y are independent (cf. Exercise 8.3.2). 

H 

Exercise 8.3.8 Use the Change of Variables Thm. to show that 

j XY dF = J xy dnx,Y 
Now prove Propn. 8.3.6 once more, using Pubini's Theorem. 

□ 



Products and Independence 



151 



Definition 8.3.9 Let fi,v be probability measures on (M, ^(R)). The 
convolution fi * fj, and u is defined to be the pushforward of the 
product measure ji ® f along the measurable map + : M x M ^ M : 
{x,y) ^ x + y. 

ji * V := ® i.e. {ji* v)B := ji® v{{x,y) : x + y & B} 



Exercise 8.3.10 (a) Show that 

{lx*v)B = j u{B — x) jj,{dx) = J n{B — y) v{dy) 

where B — X = {b — X : b £ B}. 

(b) Show that if * is a commutative associative operation on the set of probabihty measures on (R, S(R)). 
Show that (5o is an identity element for *. 

(c) Show that if X, Y are independent random variables, then Hx+y = * Hy, where fj,x is the distribution 
of X, etc. 

□ 



152 Independence 



Chapter 9 

The Spaces and Fourier Analysis 



9.1 ly Spaces 

9.1.1 Integration of complex— valued functions 

This paragraph concerns the integration of complex-valued functions. Recall that any com- 
plex number z G C can be decomposed into a real part and an imaginary part: 

z = a + ife = Re(2;) + ilm.{z) where Re(z) = o, \m.{^z) = b 

We can also write z in modulus-argument form: Recall that e'^ := cos9 + isinO. Then 

z = re^^ where r = + 6^ = \z\ and tan 9 = - 

a 

The complex conjugate of z = a + ib = re*^ is z := a — ib = re~*^. 

Similarly, if / is a complex-valued function, then / can be decomposed into a real and 
imaginary part 

/ = Re/ + ilm/ = u + iv 
where u = Rc/ and v = Imf are real- valued functions. 

Definition 9.1.1 Let (^l,J^,ii) be a measure space, and let / : J7 — > C be a complex- 
valued function. We say that / is .F-measurable if and only if both the real-valued 
functions Re/ and Im/ are .?^-measurable. 

Now note the following: 

(i) If / = n + iv is .?^-measurable, then so is its modulus, the real-valued function |/| = 

(ii) Note also that f = u — iv is measurable if / is. 

(iii) We have 

max{|tt|, \v\} = (maxju^, v'^})^ < {u^ + f^)^ < (n^ + 2|u||w| + w^) 2 = \u\ + \v\ 
and hence that 

\u\, \v\ < I/I < |ti| + \v\ 
153 



154 



D' spaces 



It follows that 



J I/I dn <oo <J=^ J |Re(/)| d/i < oo and J |Im(/)| d^ < oo 
These observations ensure that the following definition makes sense: 



Definition 9.1.2 Let (0,,J^,ii) be a measure space, and let / : $7 ^ C be a complex- 
valued function. We say that / is //-integrable if and only if |/| is //-integrable (and hence 
iff Re(/), Im(/) are both^u-integrable). We then define 

f dfi := / Re/ dji + i I Im/ djj, 



The following properties can easily be established and are left as an exercise: 



Proposition 9.1.3 (a) Iff,g are fj,-integrable complex-valued functions, thensois f+g, 
and J f + g dfj, = J f d/j, + J g dfi. 

(h) If f is ji-integrahle and c G C, then J of d/j, = c J f dji. 



(c) If f is li-integrahle, then so is f and j f dji = J f d/j, 

(d) \ Jfdfx\<J\f\dfi 



Exercise 9.1.4 Prove Propn. 9.1.3. 

[Hint for (d): There is 9 e R such that f f d/j. = \ f f dn\e'^ . So \ f f dn\ = e'*^ f f dn = Re f e''" f dn = 
/Re(e-'V)d/^</|e-'7|dM.] 



□ 



9.1.2 Definition of >C^-spaces 



Definition 9.1.5 Suppose that {0,,J^,iJ,) is a measure space, and that 1 < p < +oo 
(where p need not be an integer) . 

jCP{CI,J^, jj) is the set of all (real- or complex-valued) functions / G mJT such that 

j |/|f d^Ji < +00 

For / G n), we define the p-norm of / to be 

ii/iip=(/ i/rci/x)' 

When the underlying measure space is clear from context, and there is little danger of 
confusion, we will write jO^ instead of D'{Q.,!F,ijl). 

Remarks 9.1.6 (a) Note that is just the set of all //-integrable functions (since / is integrable if and 

only if I/I = I /I"' is integrable). 

(b) Note also that f £ C ii and only if f G 



The >C^-Spaces and Fourier Analysis 



155 



□ 

Lemma 9.1.7 The CP spaces are vector spaces. 



Exercise 9.1.8 Prove Lemma 9.1.7. 

[Hint: If 1 < p < oo, then |/ + < (|/| + \g\r < ma^{(2|/|)^ {2\g\r} < 2^(|/|'' + Ig^.] 



□ 



□ 



It will become clear that the JC^ spaces are almost Banach spaces (with norms || • ||p), and 
that is almost a Hilbert space. For our purposes, the most important spaces are and 
jCP, and we shall give a complete separate account of the theory for these spaces. 

9.1.3 £1 and 

Consider the C^-noim \ \ ■ ||i on C^{yL,J^ix). We do not yet know that it is a norm. However: 

• If / is At-integrable, then > 0, and ||0||i = 0. 

• If / is /x-integrable andc G C, then clearly 

l|c/||i = j \cf\d,, = \c\ I |/|d/. = |c|||/||i 

• The triangle inequality for complex moduli (or for absolute values in the real case) yields 
the triangle inequality for 1 1 • 1 1 1 : 

ll/ + <?l|i:= J\f + 9\d^< J \f\ + \g\ = J + J j^] d// = + l^li 

There is one more condition that must be satisfied for || • ||i to be a norm, and that is 
|i = if and only if / = 0. 



This condition fails, however, but it is nearly true: If ||/||i = 0, then J \ f\ djj, = 0. Now as 
I/I > 0, we know that this implies that |/| = /U-a.e., and thus that / = /x-a.e. We thus 
have 

• ll/lli = if only if / = /x-a.e. 

Thus, provided we consider two functions f,g E jC^{fl,T, jj,) to be the same when we have 

merely f = g ;U-a.e., the function || • ||i is a norm on C^(Q, J^, fi). 

Every norm has associated with it a notion of convergence. If /n,/ £ C^{^,T, /j) for 
n e N, 

fn^f l|/n-/||i^O asn^oo 

We say that /„ converges to / in mean, or in . 

Exercise 9.1.9 Thus far, we have only used one notion of convergence, namely ^-almost everywhere 
convergence. This is pointwise convergence, possibly excluding a set of measure zero, i.e. 

/n -> / /U-a-e- m{<^ € Q : f„{ui) f{u;)} = 

This is the type of convergence used in the Monotone- and Dominated Convergence Theorems. We now have a 
new notion of convergence (actually, we have a new notion for every £''-norm), and the two are quite different: 



156 



D' spaces 



(a) Let (f2,^,/i) = ((0, 1],B{{0, 1]), A), and define /„ = n/^Q ij. Show that /« — > /it-a.e., but that {fn)n does 
not converge in . 

(b) Let {Q.,J^,fi) = ((0, 1],Z?((0, 1]), A). Enumerate subintcrvals of (0, 1] as follows; 

Ai = (0,l] A2 = (0,i] A3 = (1,1] A4 = {0,\] A5 = (|,|] A6 = (i,|] A7 = (|,l] A8 = (0,|] 

so that 

Aan+fc = ^] for n, fc e N, < fe < 2" 
Now define /„ = 7a„- Show that /„ — > in but that {fn)n does not converge /it-a.e. 

□ 

Remarks 9.1.10 Note that the Monotone Convergence Theorem and Lebesgue Dominated Convergence 
Theorems give conditions which ensure that /i a.c. implies /^^-convergence. For example, the LDCT states that 
a hn ^ h /i-a.e., and if there is an /n-integrablc g such that \hn\ < g fi-a-.e. for all n, then f h„ J h dfi. 

Now if suppose that /„ ^ / M a.c., and that there is an fi intcgrablc g such that |/n| < g /i a.c. Define 
hn '■= \ fn — /I, h := 0. Then \h„\ < \ fn\ + \f \ < 2g, and 2g is integrable also. Thus / \hn\ djj, —> J h djj,, i.e. 
||/n-/||i^O, i.e. fn^fmCK 

□ 

A sequence (/n)neN is a Cauchy sequence in if and only if | |/n — /m| |i ^ as n, m ^ oo 
(i.e. iff 

^^Pm,ngeqN Wfn — /m||i — ^ as N — > oo). Now recall tliat a complete normed vector 
space — that is, a vector space in which every Cauchy sequence converges — is called a 
Banach space. Wc will show that is a Banach space, when equipped with the — norm. 
First, we need a lemma: 

Lemma 9.1.11 IfJ2'^=i I \ fn\ djJ, < oo, then Yl'^^=i fn converges absolutely fi-a.e. Moreover, 
the function X^^i fn is integrable, and J J^^i fn d/J. = X^^i J fn d/J.. 

Proof: Define g := J2'^=i \ fn\ = limm^oo Z^^=i \ fn\- By the Monotone Convergence The- 
orem and the linearity of the integral, J g dn = Yl'^=i J \fn\ dji < oo and hence g is inte- 
grable. In particular, \g\ < oo /i-a.e., and hence Yl'n=i /" converges absolutely /i-a.e. Now 
^ I Sn=i /"I — ^ri=i I /"I ^ 5 m G N, and g is integrable, the Lebesgue Dominated 

Convergence Theorem shows that J Yl'?^=i fn dji = Y^'^=i J fn dji 

H 



Theorem 9.1.12 (a) Suppose that {fn)n is a Cauchy sequence in C^{U,,!F,p). Then 
there exists a subsequence {fnk)k o.^d an f £ such that fn,, f /i-a.e. as k ^ oo. 
Furthermore, the original sequence converges in : ||/n — /||i ^ as n ^ oo. 

(b) (Riesz-Fischer Theorem for C^) {C^{n,J^,ii), || • is a Banach space. 

Proof: Suppose that (/n)n is a Cauchy sequence in C^. Thus for any £ > 0, we may pick 

N en such that 

sup„j n>jv Wfm ~ /mill £• We choose a strictly increasing sequence (?ifc)fc 
in N as follows: First, choose ni to be an N that works for e = |, i.e. choose ni so that 
suPm,n>ni Wfm ^ /n||i < \- Next, for /c > 1 Uk is an N that works for e = pqir, i.e. choose 

Uk > Uk-l so that SUp^>„^ Wfm - /nJll < 2*^- 

Note that \\fn^,+i - /n J 1 1 < for all keN. Define 



go ■= gk ■■= fnk - fnk-i^OI k>0 



The >C^-Spaces and Fourier Analysis 



157 



so that 

k ^ 
fnk = ^9i and Hffill < ^ 

i=0 

Hence Yl'i^o I \9i\ ^ preceding lemma, Yl'S=o9i converges (absolutely) /v-a.e., 

and / := Y.iLo9i ^ C}- As / = YnLoai = limfe^oo Z)i=o = limfc^oo/n^, we see that 
/rife f converges ^-a.e. 

To finish (a), we still need to show that — — > as n oo. Suppose that e > 0. 
Choose iV G N so that sup„j^j>^ ll/m ~ fn\\i ^ ^- Now fix an n ^ N (but allow ?ti to vary). 
Note that lim^ {Uk - fn\ = \ f - fn\, and thus that liminf^ \fm- fn\ < \ f - fn\- It follows by 
Fatou's Lemma that 

11/ - fn\\i = \f - fn\ dfj, < liminf \f - /„| d/i < liminf / |/^ - f\ dfj, < e 

and hence that ||/ — fn\\i ^ as n — > oo. 
(b) Follows immediately from (a). 

H 

Exercise 9.1.13 We have seen that £^ -convergence need not imply /x-a.e. convergence, and vice versa. 
Show that if {fn)n converges both in and /u-a.e., then the hmits are the same, i.e. show that ii fn —>■ f in 

and fn ^ g /i-a.c, then f — g /i-a.c. 
[Hint: If fn f in -C^, we may choose a subsequence so that /„;, —> f /x-a.e.] 

□ 



Now we look at JC^{CI,J^, /j,) := {/ : O — C : / is JT-measurable, and ||/||2 < oo}, where 

I I/lb := (/l/P dfi)'^. This space is nicer than C^. Not only is it a Banach space: it is a 
Hilbcrt space, and therefore wc can do geometry there. 
Define a map (•, •) : x £^ ^ C by 

{f,9) =■ J fgdu 

We will show that this is an inner product, provided we agree that two functions are the same 
when they are /x-a.e. equal. 

There are some technical details that need to be verified before we can proceed: 

Lemma 9.1.14 If f,ge , then fgeC^. 

Proof: As (|/| - 1^1)^ > 0, we have that \fg\ < 2|/| j^j < j/p + \g\'^, and thus that 
/ \f9\ d^i< j |/|2 d^ + J \9? dfK oo when f,ge C? . 

H 

Since \g\^ = |^p, we see that f^g^C? implies /, ^ G ^ which in turn implies that 
jg G Hence (/, 5) := / jg d\i exists when /, 5 G C? . 

Lemma 9.1.15 The map (•,•) : C'^ x C'^ ^ C : {f, g) ^ f fg d/j, is an inner product on 
C'^{^,J^,IJ,), provided we agree that two functions are the same when they are ix-a.e. equal, 
i.e. 



158 



D' spaces 



(i) {h + f2,9) = {h,g) + {f2,9) 
(i'i) {cf,9) = c{f,g) for all c G C. 

(Hi) {f,g) = {gJ) 

(iv) if, /) > 0, and (/, f,) = if and only if f = fi-a.e. 
Exercise 9.1.16 Prove Lemma 9.1.15. 

□ 

We already know that for real spaces, the inner product induces a norm, defined by 

||u|| = {v, 2 

We want to verify that this is also the case for complex spaces. 
Note that for any A G C we have 

< {v — Xw, V — Xw) = {v, v) — {v, Xw) — {Xw, v) + (Aw, Xw) 
= {v, v) + A(f , w) + X{v, w) + |Ap(w, w) 
= {v, v) + 2Re(A(z;, w)) + | A|^(tt;, w) 
< \\vf + 2\X\-\{v,w)\ + \X\^\\w\\^ 

Thus 

||v|p + 2|A|-|(t;,'ii;)| + |A|2||u;||2>0 for all A G C (*) 

This is a quadratic polynomial in |A| which is always non-negative. Thus, noting that the 
discriminant must be < 0, or else substituting A := we obtain: 



Theorem 9.1.17 (Cauchy-Schwarz Inequality) 
In any inner product space, {V, {■,■)), we have 

\{v,w)\ < \\v\\ ■ \\w\\ 

where \ \v\\ := {v, v)^ . 

Lemma 9.1.18 If V is a vector space over C, and that {■, ■) : V xV ^ C is an inner product 
on V, i.e. a map satisfying (i)-(iv) Lemma 9.1.15. Define || • || : V M+ by 

\\v\ \ = {v, -y) 2 

Then \ \ ■ \ \ is a norm on V, i.e. 
(i) \\v + w\\ < \\v\\ + \\w\\ 
(a) \\cv\ \ = \c\ ■ \ \v\\ for all c E C 
(Hi) \\v\\ > 0, and \\v\\ = if and only if v = 0. 



The >C^-Spaces and Fourier Analysis 



159 



Proof: Using the Cauchy-Schwarz inequahty, we see that have 

llu + lull^ = {v + w,v + w) = ll^lp + 2Re(('u, ■u))) + < ||v||^ + 2|(v, + 
< llulp + 2||'u|| • + 

= i\\v\\ + \Mf 

which yields the triangle inequality (i). 
(ii), (ii) are left as easy exercises. 

H 

Corollary 9.1.19 The map \\ ■ \\2 : jC^ >->■ : f >->■ {f,f)^ = (/ |/P d/i)'^ is a norm on 
C^{fl,J^,fj,), induced by the inner product. 

We also have: 

Theorem 9.1.20 (Holder's inequality for C'^) 
If f,ge C^(n,J^,fi), then 

< II/II2II/7II2 



Exercise 9.1.21 Use the Cauchy-Schwarz inequality to prove Thm. 9.1.20. 

□ 

Associated with 1 1 • 1 12 is another notion of convergence, namely convergence in mean square 
or ^^-convergence. If /n,/ £ JC^{CI,J^, /j,) for n G N, then we say that 

fn^f if and only if ||/„-/||2^0 

Exercise 9.1.22 (a) On ((0, 1],S((0, 1]), A), define /„ = /a„, where ^2"+fc := ^] for n,k € N, 

< fc < 2". Show that /n ^ in C^, but that (/„) does not converge a.e. 

(b) On (R+,C(K+), A) define fn ■= X]fc=i th^-iM- Show that {fn)n converges in but not in . 

(c) On ((0, 1],B((0, 1]), A) define /„ := X]fe=i ij- Show that (/„)„ converges in but not in C^. 

□ 

The preceding exercise shows that JC^ convergence need not imply ^^-convergence, or vice 
versa. For probability spaces, however, we have the following: 

Theorem 9.1.23 //(O, P) is a probability space, then 

ll/l|l<ll/l|2 

Thus C'^{n,T,F) C ji:^{n,J^,F). Moreover, if fn ^ f in C? , then also fn^fin O . 

Exercise 9.1.24 Prove Thm. 9.1.23. 

[Hint: Show 1 € C? . Apply Holder's inequality to / and 1.] 



□ 



160 



D' spaces 



Recall that an inner product space which is complete (i.e. in which every Cauchy sequence 
converges) is a called a Hilbert space. 

Theorem 9.1.25 (Riesz-Fischer Theorem for 

C'^{il, fi) is complete w.r.t. to the norm \\ ■ ||2 induced by the inner product. Thus it is 
a Hilbert space. 

Proof: For k G N, choose an increasing sequence {nk)k of natural no. such that sup„>„^ ||/m~ 
/„J|2 < 2-K Then by the MCT || \fn,+, - /nj II2 < EJI/n.+x " fnjh < ^. hence 
Z^fc l/nfe+i — /rife I < 00 //-a.e., and hence {fnk)k is ^ Cauchy sequence //-a.e. Define / : O ^ M 
by /(ti^) = limfe fn^ioj), if this limit exists, and /(w) = else. Then / is measurable, and that 
fuk ~^ f /^-a.e. as A; — 00. By Fatou's Lemma, 

II/II2 < liminf ||/„||2 < 00 

n 

(because Cauchy sequences are bounded), so that f e JC^, and similarly 

11/ - fn\\2 = liminf H/n^ - fn\\2 < sup \\fm - /n||2 ^ as n ^ OO 

1^ m>n 

Thus fn ^ /. 

H 

9.1.4 General Theory of spaces* 

We recall here the definition of the >C^-spaces. We also introduce the space JC°°: 

Definition 9.1.26 Suppose that is a measure space. If 1 < p < oo (p need 

not be an integer), then £.p{S,J^,ij,) is defined to be the set of all .?^-measurable (real- or 

complex-valued) functions O — M such that |/|^ is /x-integrable. 

A function ^ M is said to be essentially bounded iff there is a real number M such that 

I/I < M fi-a.c. 

C°°{Q.,J-.,li) is defined to be the set of all essentially bounded JT-measurable — >■ M. 
For each 1 < p < oo, we define a map 1 1 • | |p : £^($1, JF, /n) ^ M by 

ii/iip ■■= M/n^ 

We also define a map || • ||oo : C'^{^,!F, jj) — >■ M by 

||/||oo = inf{M: I/I <M ^-a.e.} 

The maps 1 1 • | |p are called >C*'-norms, or just p-norms. 

Remsirks 9.1.27 The definition of diff'ers from that of the other £^-spaces, so it is worth 

elaborating a little on it. A real- or complex-valued measurable function / is essentially bounded if there is 
a real number M > such that |/| < M /x-a.e., i.e. the set {w £ f2 : |/(w)| > M} is a /Lt-nuU set. Call such 



The >C^-Spaces and Fourier Analysis 



161 



an Af an essential bound of /. Then H/Hoo is defined to be the infimum of all the essential bounds. Note that 
||/|oo is itself an essential bound of /. Indeed, 

{u,en: \f{cv)\ > ll/IU} = \J{u,eQ: I/HI > ||/||oc + ^} 

n 

is a countable union of /li-nuU sets, and thus itself a /x-nuU set. Hence jj/jjoo is the smallest essential bound of 
/, i.e. for all M < \\f\\oo, we have /^{w £ O : |/(a;)| > M} > 0. 

We have already shown that the -spaces are vector spaces for 1 < p < oo. The same is true for jC°°: If 

f,gG C^, then \,f{uj)+g(ij)\ < \f(uj)\ + \g{uj)\ < ||/||^ + \\g\\^ for /i almost all uj € Q. hence jj/jjoo + jjffjjoo 
is an essential bound for f + g, and moreover, wc have a triangle inequality: 

||/ + 5||oc<||/||oc + ||ff||oo 

□ 

If the underlying measure space is understood from context, we shall write instead of 
jCP{S,S, fi). For the next theorem, note that if a, 6 > 0, and if 1 < p,q < oo are such that 
+ = 1, then 

, aP ¥ 

ab< \ 

P Q 

To see this, define h{t) = tb — ^, and find the maximum of h. Alternatively, apply the 
Arithmetic-Geometric Mean inequality. 

Theorem 9.1.28 Let {S,S,fj,) be a measure space and let f,g be real-valued S -measurable 
functions. 

(a) HOLDER 'S INEQ UALITY: Suppose that 1 < p < oo and that p-^ + q-^ = 1. If f e CP, 
and g G C^, then fgEC^, and 

\\f9\\i<\\fUg\U 

(b) MINKOWSKI'S INEQUALITY: Letp>l. Iff,ge CP, then 

\\f + 9\\p<\\f\\p + \\g\\p 
Proof: (a) If p = 1 (so that q = +oo), then \ fg\ < |/| H^Hoo /x-a.e. and so = fi\fg\ < 

/^l/l • Halloo = ll/lll llfflloo < oo. 

If p > 1, put a = ^'^j^jb = s-nd apply the remark just before the statement of the 

theorem to conclude 

\m9{s)i,\mi\g{s)i 
ii/iUMi, - p\m Q\\9\\i 

Integrating both sides w.r.t. ii yields the result. 

(b) This relation is easy to prove if p = 1 or p = oo. For 1 < oo < p, note that q = ^^j, 
and thus that | / + G >C^. By Holder's inequality, 

ll/ + 5ll^< / \f\\f + 9r'df, + J \g\\f + gr'd,, 

<ii/iip-ii(/+5r'ii.+iMip-ii(/+5r'ii. 
= (ii/iip+iMip) ii/+5iir' 



162 



D' spaces 



H 

It is now clear that 1 1 • | |p satisfies the following: 

(i) ||/||p>0 ||/||p = Oifr/ = 0^-a.e. 

(ii) If aGM, then = |a| 

(iii) \\f + g\\^<\\f\\^+\\g\\^ 

Thus II • lip is almost a norm on C^. The requirement that = iff / = does't hold, but 
holds only almost everywhere. To get a bona fide norm, we must identify any two functions 
that are equal fi~a.e: 

Definition and Proposition 9.1.29 Let (S", be a measure space, and let 1 < p < oo. 
Define a relation = on hy f = g iS f = g /x-a.e. Then ~ is an equivalence relation on C^. 
Define [/] := {g e jCP : f = g}. Then [0] = {g E JC^ : g = fi-a.e.} is a vector subspace of JC^, 
and [f] = f + [0]:={f + g:ge [0]}. Let 

LP{S,S,ii) = {[f]:fGjrPiS,S,ti)} 

Then is a vector space and the map, which by abuse of notation we also call || • which 
is defined by 

ll[/]||p :=ll/llp 

is a norm on IJP. 

Proof: That = is an equivalence relation is straightforward, as is the statement that [0] is a 
vector subspace of C^. It is also easy to see that 1^ is a vector space, if the operations are 

defined in the natural way (e.g. [/] + [g] := [7 + 5] — one must check that this is well-defined, 
i.e. that if [/i] = [/2] and [gi] = [32], then [fi+gi] = [/2 + 52], but that is easy.) [0] is clearly 
the zero vector in U'. Also, if [/i] = 1/2], then /i = /2 /Lt-a.e., and thus ///f = /x/f, which 
shows that ||/i||p = ||/2||p (in case p < 00), and thus that || • ||p is well-defined on LP. To see 

that it is a norm, note that (i) ||[/]||p = ||/||p > 0, and that ||[/]||p = iff / = /U~a.e. iff 
[/] = [0]; (ii) \\a[f]\\, = \\af\\, = \a\ \\[f]\\„ and (iii) \\[f] + [9]\\p = \\f+9\\p < \\[f]\\p + \\mp- 
In case p = 00, it is also straightforward to see that || • ||oo is a well-defined norm on L°^. 

-i 

In practice, we usually don't bother too much about the distinction between and LP. 
Now that we know that || • ||p is a norm, we have a notion of convergence: 

Definition 9.1.30 A sequence {fn)n in jC-P{S,S,fi) is said to converge to / G jCP{S,S,ij,) in 
LP (or in p^^' mean) iff ||/„ - f\\p ^ 0. 

□ 

We now have two notions of convergence for measurable functions: almost everywhere con- 
vergence, and convergence in mean. We write 

n ^ / In ^ / 

In a later section, we will investigate this, and other, notions of convergence in greater detail. 



The >C^-Spaces and Fourier Analysis 



163 



Theorem 9.1.31 (Riesz Fischer) 

If {S,S,ii) is a measure space and 1 <p < oo, then LP{S,S,ij,) is a Banach space. 

Proof: Suppose that {fn)n is a Cauchy sequence in £.p, i.e. that sup^>„ \ \fm — fn\\p — as 

n OC'. 

First assume that p > 1. For k £ N, choose an increasing sequence {nk)k of natural 
no. such that sup„>„^ \ \fm - fnj\p < Then by the MCT HZlfcl/nfc+i - /nj ||p < 

Efc ll/nfc+i - /njlp < oo. hence Y.k l/^fc+i - kA < oo ^-a.e., and hence (/„Jfc is a Cauchy 
sequence /i-a.e. Define / : ^ M by f{s) = lim^ /^^.(s), if this limit exists, and /(s) = 
else. Then / is measurable, and that /n^ f /x-a.e. as A; ^ oo. Then by Fatou's Lemma, 

ll/llp < lirninf ll/nllp < oo 

(because Cauchy sequences are bounded), so that / G C^, and similarly 

11/ - fn\\p = liminf ll/nj, - fn\\p < sup Wfm - fn\\p ^ as n ^ OO 

1^ m>n 

Thus fn ^ f. 

Next, assume that p = oo. We have sup„>„ \fm — fn\ < sup„>„ \\fm — fn\\oo )U-a.e., and 
thus (/n)n is a Cauchy sequence /n~a.e. Define / : 5 ^ M as above: /(s) = lim„ /n(s) if this 
limit exists, and f{s) = otherwise. Then 

I/I < |/n| + |/n-/| = |/n| + lim|/n - /ml < ||/„||oo+ SUp ||/n-/m||oo yU-a.e. 

m>n 

SO that / G jC°°, and 

l/n - /I = lim l/n - fk\ < sup ||/„ - /miloo fiS^.e. 

m>n 

Hence ||/„ - /||oo < sup^>„ ||/„ - /m||oo ^ as n ^ oo, proving that /„ f. 

H 

The following result is easy: 
Theorem 9.1.32 Let (5, 5,/x) be a measure space. The map 

(•,-):£2x£2^M:(/,5)^ J fgdii 
is an inner product on C? which induces the Lp'-norm \\ ■ ||2. Hence L?' is a Hilbert space. 

□ 

Exercise 9.1.33 Prove Thm 9.1.32. 

□ 

For probability theory, the following result is also useful: 



164 



Geometry of Hilbert Space and Generalized Fourier Series 



Proposition 9.1.34 Let {Q,J^,F) be a probability space. Ifl<p<r<oo, then 

1 1-^| Ip ^ 1 1-^| \r 

for any random variable X, so that P) C jCP(p,,J^,F). 

Moreover, if X E , then 

||X||oo = lim IIXIL 

p— >oo 

r 

Proof: Note that if X G C ^ then X'^ G £p. Now p' = ^ and q' = satisfy the relation 
^ + ^ = 1, and so Holder's inequality applied to / = \XY' and g = 1 yields 

||X||^ = JfgdF< \\f\\p'-\\g\\g:=[J \XPf^ dFf ■1 = \\X\\P 



|oo- 

1 



Next, suppose that X G JC°°. Then \\X\\p < ||X||oo, so limsupp ||-^||p < H-'^l 

If M < ||X||oo, then J \X\p dF > MP F{\X\ > M) and so ||X||p > M F{\X\ > M)p . Now 
F{\X\ > M) > 0, because M < \\X\\^, and thus liminfp > M (because F{\X\ > 

M)p — 1). Since M was arbitrary, also liminfp \ \X\\p > ||X||oo. 



9.2 Geometry of Hilbert Space and Generalized Fourier Series 
9.2.1 Projections in Hilbert Spaces 

We have already studied some Hilbert space theory earlier in this course, and we will repeat 
here the most important facts that were then obtained. 

Recall that in M", the dot product does not only induce a length (i.e. a norm), but also 
an angle: The angle 9 between two vectors x, y G is given by 

cosg= "-y, 

\m\ \\y\\ 

We can imitate this definition in an abstract inner product space {V, (•,•)), and define the 
angle between x,y & V hy 



cos 9 := yj — ^^'^^ where ||x|| := \/(x7x) 

By the Cauchy-Schwarz inequality it follows immediately that | cos9\ < 1, so that this defini- 
tion makes sense. It also follows that | cos^| = 1 if and only if a; is a scalar multiple of y, i.e. 
iff X, y are parallel. We don't really need the concept of angle, but we do want the associated 
concept of orthogonality: 

Definition 9.2.1 Suppose that {V, (•, •)) is an inner product space. We say that x,y & V 
are orthogonal, and write x _L y, if and only if {x, y) = 0. 
If G C y, we say that x ± G iff V5 G G(x ± g). 



First, you may prove the easy 



The >C^-Spaces and Fourier Analysis 



165 



Proposition 9.2.2 If V is a Hilbert space and v,w eV, then 

(a) (Parallellogram Law) \\v — w\\^ + \\v + w\\^ = 2(||w|p + 

(b) (Pythagoras) Ifv-Lw, then \\v + w\\'^ = + 

If F is a linear subspace of M", then we can project any x G M" onto V. That is, we can 
represent x as a sum 

X = x'l + x"*" where x" G V, x"*" _L V 

One can think of xH as the best approximation to x in V: It is the vector in V which lies 
closest to X. 

Suppose that F is a Hilbert space, and that is a closed linear subspace of V. If vo G V, 
we can find the best approximation of vo in W. This is the unique vector wq with the properties 
that 

(i) Wo &W, and 

(ii) 11^0 — wqW = inf{||i;o — w\ \ -.we W}, i.e. wq is the vector in W that lies closest to vq. 

(iii) Moreover, (vq — wq) -L W. 

The vector wq satisfying (i)-(iii) is called the orthogonal projection of vq onto W. Indeed, 
Vq = Wq + {vo — Wo) decomposes vq into a vector in W and a vector orthogonal to VF. It 
remains to show that orthogonal projections exist and are unique. 



Proposition 9.2.3 Let V be a Hilbert space, and let W be a closed linear subspace ofV. 
Then any vo in V has a unique decomposition 

Vq = Vq + Vq where Vq E W, Vq ±W 

Vq is called the orthogonal projection of vq onto W. 



Proof: Uniqueness: If 

vo = v^o + ^o" = + «o 
where i^o,?/o £ W and Vq,Uq ± W, then 

4' - u^o = '^0 - Vo =■ X 

is a vector with the properties that x e W and that x -L W. This implies that x -L x, i.e. 

that {x, x) = 0. Hence x = 0, and so Uq = v}q,Vq = Uq . 

Existence: Let 5 = inf{||fo ~ ""^H • w G W}, and choose a sequence Wn G W such that 
ll^^o — li^nll — > f^- We show that {wn)n is a Cauchy sequence in W: for if e > 0, we may choose 
N such that \\vo — WnW^ — 6"^ < e whenever n> N. By the Parallelogram Law it follows that 
if n, m > then 

2e+25^ > \\vo-Wn\\'^ + \\vo-Wm\\'^ = 2\\vo- l{Wn+Wm)\\'^+2\\^{Wn-Wm)\\'^ > 2S'^+^\\Wn-Wm\\ 

Since {wn)n is a Cauchy sequence, and since W is closed, there is wq G W such that Wn wq- 
We will show that wq = Vq. The fact that ||fo — 'Wo|| < lli'o — i«n|| + \ \wn — wo\\ (for all n G N) 
then is easily seen to imply that Hi'o — u)o|| = (5. 



166 



Geometry of Hilbert Space and Generalized Fourier Series 



It remains to show that vq — wq 1. W. Given an arbitrary w eW and a real A G M, have 
11^0 — wqW'^ = 6^ < \\vq — {wq + Aw)|p, so that 

-2\{vq - wo,w) + A^lliulp > 

Since this holds for all real A we must have {vq — wo,w) = 0. (Another way to see this is to 
note that the quadratic in A has a unique root at A = 0) and to calculate the discriminant.) 

H 

Exercise 9.2.4 Consider again the problem of linear least squares estimation: Given func- 
tions ^Ai(t), . . . , 0n(O and observations {ti,yi), . . . , (t„j, ym), we seek coefficients xi,. . . ,Xn 
which best fit the data, i.e. so that 

n 

Vj ~ XI ^i^i^h) for j = 1, . . . , m 

i=l 

Here, "best" means the following: For each x := we obtain numbers yj{x) '■= 

J2i=i Xi4>i{tj)- We want that yj « (x) for all j = 1, ... , m, i.e. we want to determine that 
n-tuple X for which the combined error is as small as possible. We therefore want the distance 
between the vectors y := (i/i, . . . , ym) and y*(x) := (y*, . . . to be as small as possible, 
i.e. we want to find that optimal value x* of x for which 

m 

||y-y*(x)||2 = ^(y,.-y*(x,))2 
is a minimum, where || • || denotes the usual Eucidean norm on M™. 

(a) On (M,P(M)), consider the measure = 5t^ + ■ ■ ■ + 5t^ (where 6a is the Dirac measure 
located at a). For / : M M, show that \\f\\l = Y]JLi l/(ij)P- 

(b) Conclude that ^^(M, V{M),n) is simply the set of all functions / : M M. 

(c) We need, however, to recall the convention that we regard functions which are /x-a.e. 
equal as the same function. Show that if /, 51 : M M, then 

f = g fi-a.e. iff f{tj) = g{tj) for all j = 1, . . . , m 

(d) Suppose now that we are given functions (pi{t), . . . , (pn{t)- Show that 

n 

W := Xi(l)i : xi, . . . , Xn G M} 
1=1 

is a closed linear subspace of C'^ . 

(e) Given (ti, yi), . . . , (t^, ym)- Let / : M — > M be any function such that f{tj) = yj for all 
J = 1, . . . , m. (Any two such functions are /x-a.e. equal, and thus "the same"). Let V' be 
the orthogonal projection of / onto W . Explain why there exist x^, . . . , x* G M such that 
il) = Yll=i ^i'Pi- Now explain why x* = {x\, . . . , x* ) is precisely the solution to the least 
squares estimation problem. 



The >C^-Spaces and Fourier Analysis 



167 



Thus: 

Least Squares Estimation = Orthogonal Projection 

□ 

Exercise 9.2.5 Suppose that is a closed subspace of a Hilbert space (V, (•, •)). For each v G V, let 
be the orthogonal projection of v onto W. Define a map P : V —> V hy Pv := u". Use the uniqueness of the 

orthogonal projection to prove the following results. 

(a) Show that P is a bounded linear operator. 

(b) Show that P is idempotent, i.e. that P^ = P (i.e. P{Pv) = Pv for all v gV). 

(c) Show that P is self-adjoint, i.e. that {Pvi,V2} = {Pvi,Pv2} = {vi,Pv2) for all vi,V2 € V. 

(d) Show that kerP = (ran P)-^ = W-^. 

□ 

9.2.2 Orthonormal Bases 

It is well-known that every vector space has a basis, i.e. a maximal linearly independent set. 
We call such a basis a Hamel basis. Every vector can the be written as a linear combination 
of these basis vectors in a unique way. 

For Hilbert spaces, we have another notion of basis, namely that of orthonormal basis. 

Definition 9.2.6 A subset $ of a Hilbert space V is set to be an orthonormal subset if 
and only 

(i) 1 1011 = 1 for all ^ G $ (normality) 

(ii) If (/>! 7^ (/)2 G then (^i _L 02 (orthogonality) 

An orthonormal basis for y is a maximal orthonormal subset of V. Such a set is also 
referred to as a complete orthonormal set. 

In M", the standard basis vectors form an orthonormal basis. 

Clearly, $ is an orthonormal basis for V if and only if we cannot find any vectors v of 

unit length which are orthogonal to every vector in Since non-zero vectors can always 
be scaled to vectors of unit length, we have shown the following trivial but useful fact: 

Lemma 9.2.7 $ is an orthonormal basis for a Hilbert space V if and only if for any v 
we have 

V J- (p for all (f) E ^ implies v = 

It can be shown that if * is an orthonormal subset of V, then there is an orthonormal 
basis $ such that * C i.e. that: 

Proposition 9.2.8 Every orthonormal subset of a Hilbert space can be extended to an or- 
thornormal basis. 

We will not prove this proposition, but the idea is simple: Keep adding orthonormal vectors to 
^ until you can't find any more. Then stop. We did precisely this by induction earlier in the 
course for finite-dimensional Hilbert spaces — Gramm-Schmidt orthogonalization. The only 
problem for general, non-finite dimensional Hilbert spaces is that one will not stop in finite 
time, i.e. one needs to use a transfinite induction, and this requires a deeper understanding of 
set theory then you currently possess. For this reason, we will restrict ourselves to separable 
Hilbert spaces. 



168 



Geometry of Hilbert Space and Generalized Fourier Series 



Definition 9.2.9 A Hilbert space is said to be separable if and only 
if it has a countable orthonormal basis $ := {(f>n '■ n G N}. Thus the 
sequence {(pn)n has the following properties: 

(i) {(t>n,(t^m) = 5nm (orthonormality) 

(ii) If v _L ^„ for all n G N, then u = 0. (completeness) 

We will now show that every vector v m & separable Hilbert space V can be expressed as 
"infinite linear combination" of orthonormal basis vectors (0n)n) i-e. that 

oo 

V = ^ c„e„ for unique c„ G C 

n=l 

First, however, we need to say what we mean by the expression Yl'^=i Cnff'n- This is not an 
infinite series with number-terms, but vector-terms. The definition works in any normed 
linear space: 



Definition 9.2.10 Suppose that {V,\\ ■ ||) is a normed vector space, 
and Vn,v G F for n G N. Consider the series J2'^=i''^n, and define 
Sm • — 5^ Vn to be the m*^ partial sum (which is just a sum of finitely 
many vectors and thus a well-defined) . We say that 

oo 

'^Vn=V iff 
n=l 



Speaking of limits, note that the inner product is continuous, i.e. 

Lemma 9.2.11 If V is an inner product space, Vn,v,w G V, and Vn — ^ v, then {vn,w) — 
{v,w) i.e. 

{limvn,w) = lim(i;„,w;) 
n n 

Exercise 9.2.12 Prove Lemma 9.2.11. 

[Hint: Apply the Cauchy-Schwarz inequality to {«„ — v,w).] 

□ 

Remarks 9.2.13 Note that the limit in {Iim„t;„,w) is a limit of vectors in the inner product space, 
whereas the limit in lim„{vn, w) is a limit of complex numbers. 

□ 

Now that we know what v = Yl'i^=i ^n'pn means, we can try to determine what the Fourier 
coefficients Cn should be for a given v ^V. For motivation, consider with the Euclidean 
inner product and the standard basis {en)n=i,...,d- Any vector x = (xi, . . . , Xd) can be written 
as X = Yln=i^nGn- A little calculation shows that Xn = (x,e„). We try to imitate this 
argument in arbitrary Hilbert spaces. 




The >C^-Spaces and Fourier Analysis 



169 



Note that if it were the case that v = J2'^=i Cn<^n, then it seems natural to argue as follows: 

oo 

{V, = Cnfpn, (pi) 

n=l 

oo 

n=l 

oo 



n=l 



This argument is purely suggestive — the above operations are valid for finite sums, and we 
must verify that they are valid for infinite series. We begin this now. 

In the following Proposition, note that {(t)n)n need not be an orthonormal basis, just an 
orthonormal set. 



Proposition 9.2.14 (Bessel's Inequality) 

Suppose that {(l)n)n is an orthonormal sequence in an inner product space 
V, and that v ^V. Define Cn := {v^cpn)- Then 



\v\? 



n=l n=l 



Proof: Define Sn '■= J2k=i (^k'Pki ^'^d let Vn ■= v — Sn- If 1 < i < n, then 

n 

{Vn, (pi) = {V, (pi) - '^{V, (pk) {<Pk, (pi) = {V, (pi) - {V, (pi) = 
k=l 

and hence _L <;!!)i for alH = 1 , . . . , n, so that in particular Vn -i- Sn- By Pythagoras, 

IblP = ll^n + = ll^^nlP + ||5'„|P 

/ n n \ 

= IKW'^ + ( ^ ^ ^^^^ ) 
\i=i j=i I 

n n 
i=l j=l 



n 

2 



i=l 1=1 



Now let n — oo to obtain the result. 



Corollary 9.2.15 Suppose that {(pn)n is an orthonormal sequence in an inner product space 
V , and that v & V and that Cn := {v, (pn)- Then c„ — ^ as n ^ oo. 

Exercise 9.2.16 Prove Corollary 9.2.15. 



170 



Geometry of Hilbert Space and Generalized Fourier Series 



□ 

Next, we show that if (^n)n is an orthonormal sequence (not necessarily a basis) in a 
Hilbert space V, and if v € V, then Yl'^={^^4'n)4>n always converges (though not necessarily 
to v), and that inner products commute with infinite sums also. 

Proposition 9.2.17 (a) If {4>n)n is an orthonormal sequence in a Hilbert space V, and if 
V & V, then J2'^=i{''^^4>n)(pn converges. 

(b) Ifv = J2n=l ^n(l>n, then {v, (f)n) = On- 

Proof: (a) Let Cn ■= {v, (f)n), Sn := J2k=i '^k^k- We must show that (S'n)n converges, and for 
that, it suffices to show that it is a Cauchy sequence in V. Now if n < m, then 



\\Sn ~ Sn 



E 

k=n+l 



Ck(t>k 



E 

fe=n+l 



< £ whenever m > n are sufficiently large, and thus 
^ < £ when m>n are sufficiently large. 



Now 

\cn\ is an increasing sequenc of positive real numbers. By Bessel's inequality, 
X^^i |cnP < IblP) and hence Yll^=i \^n\^ converges. From this we see that (X]^=i |cfcP)ri is 
a Cauchy sequence (as a sequence of reals converges iff it is Cauchy). It follows that, for any 

£ > we have 
that 1 1 Sffi Sn 

(b) We have v = lim„ S^, and hence {v, (j)k) = lim„(5'n, ^k)- But clearly 

n 

SO that limn(5'n,^fc) = Cfe. 



a n < k 
ii n> k 



Theorem 9.2.18 Suppose that {(j)n)n is an orthonormal sequence in a 
Hilbert space V. The following are equivalent: 

(a) {(t)n)n is complete, i.e. an orthonormal basis. 

(h) If V -L (pn for all n, then v = {). 

(c) IfveV, then V = Y.n=l{'"^'Pn)<^n 

(d) Ifv,we V, then {v,w) = Yln=l{'"An){(t>n,w) 



(e) (Parseval's Identity) If v G V, then \\v\\^ 



n=l \'^\ 



Proof: (a) ^ (b) is Lemma 9.2.7. 

(b) =^ (c): Let := {v, (j)n), and w := Yl'^=i Cn4>n (which we know exists by Propn. 9.2.17). 
Then by Propn. 9.2.17(b) 



{v - W, (pn) = {v, (pn) - Cn 







all n G N 



The >C^-Spaces and Fourier Analysis 



171 



and hence v — w = 0. 

(c) ^ (d): If = Yln=l (^ri(pn and W = Yl'^=l dn(t>n, then {v, </!>„) = Cn, (^„, w) = {w, (f)n) = dn 

and 



{v,w) = hm y^(cn(t>n,'w) = hm hm {Cn4>n,d, 



n=l 



p— >oo ' \ g— >oo 

n=l \ m=l 



hm c„dn = c„dr, 



p— >oo 



n=l 



n=l 



(d) ^ (e): = = J^'^^ii'^^ M {4>n,v) = E^=i CnCn- 

(e) ^ (a): If {<pn)n is not complete, i.e. not a maximal orthonormal set, then there is ^ € ^ 
so that I IV' 1 1 = 1 and such that ip ± (f)n for all n. Then by hypothesis 



n=l 



contradiction. 



We have now shown that if {4>n)n is a complete orthonormal sequence in a Hilbert space 
V, then every vector v van be written as an infinite series 



v = ^ Cn<t>n where = {v, (l)n) 



n=l 



The Cn are known as the Fourier coefficients of the vector v relative to the orthonorma basis 

(0n)n- 

Even if the orthonormal sequence {(f)n)n is not complete, the sum Yl^=i Cn4>n nevertheless 
has a nice interpretation: 



Theorem 9.2.19 Suppose that V is a Hilbert space, and that W is a 
closed subspace of V. Let {(f>n)n be an orthonormal basis for W. Then 

oo 

''^^{v,<f>n)4'n is the orthogonal projection of v onto W 

n=l 



Exercise 9.2.20 Prove Thm. 9.2.19. 



□ 



We recall the following result: 



Theorem 9.2.21 (Gram-Schmidt Orthogonalization) // {V, {■, •)) is an inner product 
space, and {vi,V2,V3, . . . } form a linearly independent set, then there is an orthonormal 
set {01, (j)2, 03, . . . } such that for all n 

span{i;i, ...,Vn}= span{0i, . . . , 0„} 



172 



Geometry of Hilbert Space and Generalized Fourier Series 



Proof: Suppose that {vi,V2,V3, . . . , } is a linearly independent subset of V. We proceed by 
inductively building an orthonormal set {0i, (f)2, (f)3, ■ ■ ■} so that 



^1 

\\vi\ 



span{i;i, ...,Vn} = span{^i, . . . , (f)n} for i = 1, . . . , n 
. Then \\(pi\\ = 1, and certainly span{(?!)i} = spanjvi}. 



Define (pi 

Assume now that we have already defined cpi, . . . ,<pnSO that {^i, . . . , is an orthonormal 
set with the same span as {vi, . . . ,Vn}- We must now define (pn+i- First define 



w, 



n+l = Vn+1 - (l>j)(l>j 



and note that 



(i) (j)n+i 7^ 0, for otherwise would be a linear combination of ^i, . . . , and Vn, and thus 
a linear combination oi vi, . . . , Vn+i- But vi, . . . , Vn+i are linearly independent. 



(ii) If 1 < j < n, then 



{Wn+l,<t>j) = {Vn+lAj) - ^{Vn+lAk){(t>kAj) = 



k=l 

It follows that (pi, ... , (pn, Wn+i form an orthogonal set, and thus linearly independent, 
(iii) As spsia{(pi, (pn} = span{t;i, . . . , Vn}, we see that spsia{(pi, ...,(pn, Wn+i} = span{vi, . . . , Vn+i}. 
The only potential problem is that we might not have ||uin+i|| = 1. Therefore, define 

Wn+l 



(pn+1 



\Wn+l\ 



9.3 Fourier Series 

This section is adapted from Measure Theory and Probability, by Malcolm Adams and Victor 
Guillemin, Wadsworth 1986. 



9.3.1 Statement of Results 

We begin by stating the following theorem: 



Theorem 9.3.1 The functions 
(pn{x) := 



1 imrx 

e L 
2L 



form a complete orthonormal basis for the Hilbert space 
C\[-L,L],B[-L,L],\). 



The >C^-Spaces and Fourier Analysis 



173 



The fact that the (f)n form an orthonormal sequence is a rather straightforward exercise: 
Exercise 9.3.2 Suppose that 4>„ are defined as in Thm. 9.3.1. Show that {4>n,4'm} = S„m- 



□ 



The fact that the 0„ form a complete set is more difficult, and left for the next subsection. 

To say that the {4>n)nel form an orthonormal basis, rather than just an orthonormal set, 
means that every function / G C'^[—L,L] can be represented as a series 



fix) = Yl 



2L 



(By an expression such as CnC*"^ we mean the limit lim X^^_jv ^nC*"^.) If we define 

Sn{x) := En=-iv Crie*"^^/^ = Co + En=i(cne^"^'=/-^ + c_„e-^"'^^/^), then this, in turn, means 
that S]\f f in jC? as N ^ oo. 
Now note that 

for all n G Z 



as e^'^*" = 1. It follows that if we regard the ^„ as functions on M rather than just [—L,L], 
then they are periodic with period 2L — a concept we now define: 



Definition 9.3.3 A measurable function / : (M,H(M)) — C is periodic 
with period a iff 

f{x + a) = f{x) all X G M 

Note that any function / : (— L, L] — > M can be extended to a periodic function (with 
period 2L) in a unique way. For if a; G M, then there is n G Z such that {2n — 1)L < x < 
{2n + 1)L. Then —L < x — 2nL < L, and necessarily we must have f{x) := f{x — 2nL). 

In many cases, however, we can get even stronger convergence the ^^-convergence: 



Theorem 9.3.4 Suppose that f is continuous and periodic of period 
2L, and that it is picccvjise differentiable on [ — L,L\. Also let : — 



^ /i'i/(x)e-^"'^^/-^ dx. Then 



2L 



Cne*"'^^/^ converges to f{x) uniformly in x 



i.e. for every e > there is M & N such that 

N 



,in7rx/L 



n=-N 



< £ for all X whenever N > M 



Although we find it simpler to prove results using the orthonormal basis ^=e l , it is 

"V ZIj 

customary and useful to state these results in terms of sines and cosines. The next exercise 
shows how to go from one representation to the other, and thus that they are equivalent. 



174 



Geometry of Hilbert Space and Generalized Fourier Series 



Exercise 9.3.5 (a) Note that 



cos t> = sm f 



2i 

Show that 

cos(7rx/L), — ?= cos(27ra;/L), —5= cos(37rx/L), . . . , —3= sin(7ra;/L), — sm{2nx/L), — sin(37ra;)/L . 



forms an orthonormal set in C^[—L,L\. 

(b) Show that this orthonormal set is complete (i.e. forms an orthonormal basis) if and only if {-^^^e.^"^ ■ n G 
Z} forms an orthonormal basis. 

(c) Show that if n e N, then 

c„e ' + c-„ej^ = o„ cos(n7ra;/L) + bn aminTTx/L) 

where 

(d) Deduce that if 

n=—oo ~^ 

then 

f{x) = ao/2 + cos{nnx / L) + b„ sin{mvx/ L)J 

n=l 

where 

X TITVX X /"^ TITTX 

a„ = Y f{x) cos — da; bn = y }{x) sin da; 

and vice versa. 



□ 



9.3.2 Examples 

It is useful to note that 



Prove it! 



j e*"^^/-^ dx = for all n e Z, n 7^ 



Exercise 9.3.6 Consider the saw-tooth function, namely the periodic extension of / : \—L, L] 
M : x X to all of M. 

(a) Draw a graph of /. 

(b) Show that the Fourier coefficients c„ := 2X J-l f{x)e~'^'^^^/^ dx of / are given by 

Cn=-{-lT 

mr 

(c) Determine now the sine/cosine coefficients an = J^j^f{x)cos{mrx/L) dx and bn = 

J^j^sm{mrx/L) dx. Show that 

2L 



a„ = all n bn = — (— 1) 

mr 



n+l 



The >C^-Spaces and Fourier Analysis 



175 



(d) Explain how you could have known in advance that the a„ are zero. [Hint: Think in 
terms of even and odd functions.] 

(e) Draw the graphs of J2n=i sin{mrx/L) for A?" = 1, 2, . . . , 6 (Use, e.g., Excel, R or Matlab) 
to see what is going on. 

Exercise 9.3.7 Consider the periodic extension of / : [—L,L] M. : x \x\ to M. 

(a) Draw a graph of /. 

(b) Determine now the sine/cosine coefficients a„ = J^j^ f{x) cos^nirx/ L) dx and 6„ = 

J^j^sm{mrx/L) dx. Explain how you can be certain in advance that the bn are zero 
[Hint: Think in terms of even and odd functions.] 

(c) Draw graphs of ^ + a„ cos(n7rx/L) for N = 1,2,..., 6 (Use, e.g.. Excel, R or 
Matlab) to see what is going on. 



Exercise 9.3.8 Consider the periodic extension of / : [— 1, 1] ^ M to all of M, defined by 



(a) Draw a graph of /. 

(b) Determine the Fourier coefficients a„,6„ of / 

(c) Draw graphs of ^ + ^^^;^(o„cos(n7ra;) + 6„sin(n7rx)) for A = 1,2, ... ,6 (Use, e.g., Excel, 
R or Matlab) to see what is going on. 



□ 




□ 



9.3.3 Proofs* 



For simplicity (no J's) we restrict ourself to the case L 
therefore, let / be a periodic function with period 2tt. 
Note that if / is a closed interval of length 27r, then 



TT. Throughout this section. 




(t) 



For if / = [a — TT, a + tt] , then 




f{x) dx 




f{x) dx + 




f{x) dx 




f{x) dx 




Define c. 



-n 




176 



Geometry of Hilbert Space and Generalized Fourier Series 



Theorem 9.3.9 Suppose that f is a continous periodic function, with 
period 2tt and that xq G [— tt, tt] . Suppose further that the left- and right 
derivatives \\mx-\xo '^^^^1^^^°^ '^^^ l™a;J,a;o '^^^ilfp'^"^ exist at xq. Then 



^ oo 

^ Y: «ne--° = f{xo) 

n=— oo 

(i.e. the series converges, and its limit is f{xQ).) 



The proof needs a small lemma: 
Lemma 9.3.10 Define -Djv(x) := ^ En=-jv e'"""- Then 

(i) f:^D^{x)dx = i 

(ll) Dn{x) = 



27r if n = 
else 



Proof: To obtain (i), just observe that 

r e^"^ dx 

J —IT 

(ii): Define a = e'^. Then 



^ 1^1 rj2^+l-l 

a 



27r ^ 27r ^ 27r a - 1 

n=—N n=0 



Proof of Thm. 9.3.9: Let Sn := ^ En=-M Cne^"^° = 2^ /(^) En=-M 

and let D^ix) := ^ Yln=~N ^^^^ that, using a change of variables and (j), we have 

/TT /"TT 
f{x)DN{xo - x) dx= I f{xo - x)Dn{x) dx 

According to (i) of the preceding lemma, we have J^^ Dj^{x) dx = 1, so that J^^ f{xo)DN{x) dx 
f{xo). Hence 

/TT 
[/(xo - x)- f{xo)]DN{x) dx 
-TT 

Define now 

^(3,-) /(^o - x) - /(xo) _ /(xo -x) - f{xo) 



_ ]^ 2; ye*^ — 1 

Now consider the two factors on the right in the above equation: The first factor /(^o-30-/(a;o) 
is defined and continuous everywhere on [— tt, ttI except at a; = 0. Nevertheless lim /('^o-a;)-/(j:o) 

and lim ■/(■^o-j')-/(j:o) gj^jg^^ assumption. The second factor -. is defined and continu- 

ous everywhere on [— tt, tt] except at x = 0. Nevertheless, according to L'Hopital's Rule, 



The >C^-Spaces and Fourier Analysis 



177 



lim -i§—r = —i. It is clear therefore, that g{x) is defined and continuous everywhere on 
[— 7r,7r] except at x = 0, but that ^(0— ) := limg'(a;) and 5(0+) := lim.g{x) exist. It is 

x1 xl 

therefore clear that g G vr, tt]). 

Using (ii) of the preceding lemma, we have 

SN-f{xo) = f [/(xo-x)-/(xo)]L'iv(x) dx = ^ 5(x)e^(^+i)" dx-^ g{x)e-''''' dx 

and hence ^ 

Sn - f{xo) = ^=(d_(jv+i) - dN) 



where dn is the n*'* Fourier coefficient of /. As — as n — ^ ±00 by Corollary 9.2.15, we 
see that Sn ^ fi^o) as — > 00. 

H 

A function / is said to be piecewise differentiable on a compact interval if and only if it 
is the derivative f'{x) exists and is continuous every where except at possibly finitely many 
points. For such functions, we get much better convergence. Before we state this result, recall 
this simple but useful result for obtaining uniform convergence from pointwise convergence: 

Exercise 9.3.11 Suppose that an{x) are functions and that Y2^=i '^■"■{x) = f{x) for all x i.e. the series 
converges, for all x to f{x). Suppose further that there are real numbers r-n such that |ari(a;)| < Vn for all x 
and all n. Show that if rn < 00, then J^^^j an converges uniformly to / , i.e. show that in that case, for 

every e > we can find A'" such that for all n > N and all x wc have ] '}2k=i '^k(x) — f{x)\ < e. 
[Hint: Let e > 0, and choose so that Vk < e for all n > N. (Why can we do this?) Now note 

I Y.k=i o-k{x) - XlJILi ak{x)\ < X]feLu+i '^k < YlkLn+i fk < s for all n > M, and all x. Let m — » 00 to get 
I X)fc=i o-nix) — /(a;) I < e for all n > A'' and all x.] 

□ 

Next, note the following: 

Lemma 9.3.12 Suppose that f is continuous and periodic of period 2tt, and that it is piece- 
wise differentiable on [— 7r,7r]. For x £ [— vr, tt] let g{x) := f'{x) where f'{x) is defined, and 
let g{x) be arbitrary (e.g. set g{x) := 0) otherwise. Let Cn,dn be the Fourier coefficients of 
f,g respectively. Then 

dfi — inCji 

Proof: As / is piecewise differentiable, there are tt — ao < ai < • • • < ap = tt so that g is 
continuous on (aj,ai+i). Then, integrating by parts 



J—ir J ai-i 

V 



i=l 



/(x)e-™^ + in / /(x)e-*"^ dx 



ai 



= /(x)e-'"^ +in r fix) 

J-n 

/TT 
f{x)e~ 
-n 



dx = V2mncn 



(because /(7r)e*"'^ = /(— 7r)e '"'^ by periodicity of /). 



178 



Geometry of Hilbert Space and Generalized Fourier Series 



Theorem 9.3.13 Suppose that f is continuous and periodic of period 
2tt, and that it is piecewise differentiable on [— 7r,7r]. Also let Cn ■= 



^ Jl„ /(a;)e-^"^ dx. Then 



/2tt 



^ oo 

-= CnC^"'^ converges to f{x) uniformly in x 



27r 

n=— oo 



i.e. for every e > there is M eN such that 

N 



1= ^ cne'^'-fix) 

2^ n=-N 



< s for all X whenever N > M 



Proof of Thm. 9.3.13: For each x, let 5jv(x) := ^ En=-7V Cnfi"". Then by Thm. 9.3.9, 
we have that S]\[{x) f{x) for each x. Note that |cne*"^| = |c„|, so if we can show that 
Yl'^=-oo ^ then the preceding exercise will yield that Sn converges to / uniformly. 
Now let dn be the Fourier coefficients of g = f . Since g is piecewise continuous, g £ C'^, and 
hence Yl'^=-oo ~ WdW^ < Lemma 9.3.12, we see that Z)^_oo"^|cnP < oo. The 

Cauchy-Schwarz inequality for counting measure on Z yields that 



i:m<icoi+i:(^)(hi^-i)^i^oi+ e 



J^l^knP I < OO 



H 

1 „inx 

basis. In order to do this, it suffices to show that 



At this point, it remains to show that the sequence (bnix) '■= ^=e*"^ is an orthonormal 

v27r 



(/) ^n) = for all n G Z implies / = a.e. 

We first note that: 

Lemma 9.3.14 If (/, = for all n eZ, then 



I 

J a 



b 

f{x) dx = for all subintervals [a, b] C [—it, tt] 



Proof: For e > 0, define on [— 7r,7r] the function 

X — CI b — X 

Js{x) := —^I[a^a+6){x) + I[a+e,b-e]{x) H —I(b-e,b] 

Note that is a continuous piecewise differentiable approximation of the indicator function 

I[a,b]^ with Je T aS £ J, 0. 

It follows from Thm. 9.3.13 that S^iJe) Je uniformly (where {SN{Je))N is the sequence 
of partial Fourier sums for the function J^. It follows easily that also S^iJe) Je in -C^, and 
thus that 

{f,SN{Je))^{f,Je) 



The >C^-Spaces and Fourier Analysis 



179 



But {f,4>n) = for all n, and hence {f,SM{Je)) = for all N, from which it follows that 
(/, J^) = 0. Thus by MCT 

I f dx= I fhnb] dx = lim / fJe dx = lim(/, J^) = 

Ja J-n ^-10 J-7T £-10 

H 

Proof of Thm. 9.3.1: Suppose that {f,(l)n) = for all n G Z. Let 

P := |b G H[-7r,7r] : J f dx = 

It is easy to see that P is a A-system. By the previous lemma, D contains all closed subin- 
tervals of [— vr, tt], and hence, by the Monotone Class Theorem (tt — A version) we have 
V = H[— 7r,7r]. It follows immediately that / = a.e. 

H 



180 Geometry of Hilbert Space and Generalized Fourier Series 



Chapter 10 



Weak Convergence and 
Characteristic Functions 

10.1 Weak Convergence and Convergence in Distribution 

We will scratch only the surface of the theory of weak convergence, also known as conver- 
gence in distribution of probability measures, and restrict ourselves entirely to measures on 
(M,^(M)). Define 

A^i(M) := set of all probability measures on M 

and 

Cb{M.) := set of all bounded continuous functions M — M 

Recall also that if X : (p,,J-,¥) — ^ (M, i3(R)) is a random variable, then the law (or distribu- 
tion) of X is a probability measure fix on (M, fi(M)) defined by 

Hx{B) := FX-\B) = F{X e B) 

The definition of weak convergence uses both measure-theoretic and topological proper- 
ties: 



Definition 10.1.1 (a) Let e Mi{M) for n 


G N. We say that 


{Hn)n converges weakly to /n, and write 




w 




iff for each / G Cb{M.) we have 




lim f diJ,n= f dn 

n-^ooj J 




(b) If Xn,X are real-valued random variables, we 


say that {Xn)n con- 


verges in distribution to X if and only if - 





Remarks 10.1.2 Note that, in the definition of convergence in distribution of random variables, the X„, X 
are not required to be defined on the same space — it is only their laws that matter. Nevertheless, if the X„, X 
are all defined on the same probability space then, seeing that J f d^x = E[/(X)] by the change of 

variables formula for integrals, we have 

Xr^^X iff E[/(X„)] ^ E[/(X)] forall/eC6(R) 



181 



182 



Weak Convergence 



□ 

Exercise 10.1.3 1. Show that if Xn X, then X„ A X. 
2. Show that if Xn ^ X, then X„ A X. 

□ 

In elementary (non-measure-theoretic) probability texts, the definition of convergence in 
distribution is often given in terms of distribution functions. Recall that a function F : M — ^ 
[0, 1] is a distribution function provided that 

(i) F is increasing, i.e. x <y ^ < F{y). 

(ii) lim F{x) = and lim F{x) = 1. 

a;— >— oo x— >+oo 

(iii) F is right-continuous, i.e. lim F{y) = F{x) for all x G M. 

As F is increasing, F{x—) := liniyia: F{y) exists. F is discontinuous at x precisely when 
F{x) ^ F{x—). We shall call a point x G M where a distribution function F is discontinuous 
an atom of the distribution function. If F is the distribution function of some random variable 
X, then X is an atom of F if and only if P(X = x) = F{x) — F{x—) > 0. 

Recall also that every probability measure on M induces a distribution function: Simply 
define F{x) = oo,x]. The converse is also true: Any distribution function F yields a 
unique probability measure ^ on (M, ^B) defined by ^(— oo,x] = F(x), and which can then be 
extended to all Borel sets by Caratheodory's extension theorem. 

Definition 10.1.4 (Non-measure-theoretic definition of convergence 
in distribution) 

Let Fx„,Fx be the distribution functions of random variables Xn,X. 

We say that converges to X in distribution, written Xn X, pro- 
vided that 

Fxn{x) Fx{x) as n ^ +oo 
at every point x where Fx is continuous. 

Example 10.1.5 Consider constant random variables Xn := ^ on some probabiUty space {Q,J^,F). The 
law of Xn is just the Dirac delta 6i, i.e. the point mass at ^. Clearly, we ought to have 5i — > (5o, as the 
points ^ lie closer and closer to the point 0. Nevertheless, we do not have 5i {B) — » 5o{B) for every Borel set: 
For example 

6i{Q}y^So{0} 

n 

as ^ 1. Similarly, we do not have 

J f{x)Sx{dx)—> J f{x)5o{dx) for all measurable functions 

take / = /{o} or / = /(_oo,o]i for example. 

However, the notion of ^ lying "closer and closer" to is a topological notion. If / respects topology, i.e. 
if / preserves the "closeness" -relation, i.e. if / is continuous, then we have lim„ /(i) = /(O), i.e. 

lim J fix) 5i_{dx) = lim/(i) = /(O) = J fix) Soidx) 

□ 



Weak Convergence 



183 



Lemma 10.1.6 A distribution function can have at most countably many points where it is 
discontinuous. 

Proof: Let D := {x € M. : F{x) — F{x—) > 0} be the set of all points where F is discontinuous, 
and, for n G N, define = {x G M : F{x) - F{x-) > Clearly, D = Un-^n- But since 
< F(x) < 1 for all x, and since F is increasing, each Dn can have at most n elements. 

H 

Proposition 10.1.7 Let Fn, F he distribution functions on R for n G N 
(in the sense defined above), and that Fn{x) F{x) at every point x 
where F is continuous. Then there exists a probability space (ri,jr, P) 
carrying random variables Xn, X (for n G Nj such that 

Fn = Fx^, F = Fx and X^"-^ X 

Proof: Let (J^,:r,P) be ([0, 1],H[0, 1], A) and define 

X^{uj) = inf{y G M : F{y) > u} = sup{y G M : F{y) < u} 
X-{u}) = inf{y G M : F{y) > u} = sup{y G M : F{y) < uj)} 

and define X^{lo) and X~{lo) similarly. 

Note that we always have X^{u!) < X+(a;). If a; G [0, 1], then [X~{u), is a closed 

subinterval of M, possibly degenerate, i.e. consisting of just a single point. When the interval 
[X~(uj),X~^(uj)] is non-degenerate, then F is constant with value co on that interval. Clearly, 
therefore, if [X~ (oji) , X~^ (uji)] = [X~ {002) , X'^ {002)] is non-degenerate, then uji = uj2- As each 
non-degenerate interval must contain a rational number, there are at most countably many 
Lo for which is non-degenerate, i.e. at most countably many uj G [0,1] for 

which / X-{uj). Thus X+ = A-a.s. 

Note that by right-continuity of F, we have 

a; < F{x) X-{u}) < X 

and thus 

F{X- <x) = A{(j G [0, 1] : X-{uj) < x} = A[0, F{x)] = F{x) 

This shows that the distribution function of X~ is F. As X~^ = X~ a.s., X+ has the same 
distribution as X~ , i.e. the distribution function of X~^ is also F. 

In the same way it follows that the random variables X+ and X~ have distribution 
functions Fn- 

For definiteness, define X := X^. We now show that X^ X and that X~ — ^' X . For 
let w G O, and let y be a point of continuity of F such that y > X^{lo). Then F{y) > co, and 
hence, for all sufficiently large n, also Fn{y) > uj (because Fn{y) — F{y). Hence y > X^{u!) 
for all sufficiently large n, so that hmsupX+(a;) < y. Now since there are at most countably 
many points where F is not continuous, we must have 

limsupX+(a;) < X+{uj) 

In the same way, we can prove that 

liminfX-(a;) > 



184 



Weak Convergence 



Putting these inequalities together, we see that 

X-{uj) < Hminf X-(a;) < HmsupX+(a;) < X+{u) 
Now since = X~^ a.s., we must have X~ X and X+ X. 



Theorem 10.1.8 Suppose that fJ,n,fJ' o^fe probability distributions on M 

and that Fn,F are the associated distribution functions. Then /7,„ ^ /j, 
if and only Fn{x) — > F{x) at every point a; G M where F is continuous. 



Proof: First suppose that — ^ Let a; G M and let (5 > 0. Define a bounded continuous 
function / by 

{1 if y < X 
1 - 5~^{y - x) 'd X <y < x + S 
if y > X + (5 

Thus if 5 is small, then / is a continuous approximation of a step function which jumps from 
1 to at X. 

Note that I(^-oo,x] ^ / ^ -'^(-oo,a;+i5] • Since /x„ /i, we must have 
limsupF„(x) = limsup / I(^-oo,x] dUn < Hmsup f d^n = f d/j, < I{-oo,x+s] dji = F{x+S) 

n n J J J J 

Using right continuity of F, we see, upon letting 5 | 0, that limsup^ ^^(x) < F{x) for all 
xeR. 

Similarly, define 



9{y) = < 



1 if V < X — S 



1 — S {y — {x — 6)) if X — S < y < X 
ii V > X 



so that /(_oo,a;-5] < 9 < I{-oo,x]- Then 

Fnix-) = Hni-00,x) > J Q djln 

Since /i„ /x, we must have 



liminf ) > liminf / g djin = 9 dfi > F{x — 6) 

" n J J 



Letting (5 | 0, we get 



for all X G M. 

It now follows that 



liminf F„(x-) > F{x-) 



F{x—) < liminf < limsupF„(x) < F{x) 



Weak Convergence 



185 



In particular, if a; is a point of continuity of F (i.e. if F{x—) = F{x)), then 

limF„(x) = F{x) 

n 

as required. This proves the forward direction. 

Now assume that Fn{x) converges to F{x) at every continuity point of F. There is a 
probability space ($7, P) which carries random variables Xn-,X with the properties that 



Fx„ = Fn Fx = F and X„ 



X 



If / G bounded by a constant K, then the f{Xn) are random variables which are 

bounded by K as well. By the Lebesgue Dominated Convergence Theorem, we therefore have 



j fdixn = J fiXn) dF^ J fix) dF = J f 



Thus jJLn A*) proving the reverse direction. 



□ 



We now seek a kind of compactness condition for probability measures on (M, ;S(M)). 



Definition 10.1.9 A sequence of probability measures on 

(M, B(M)) is said to be tight if and only if 

sup Hn{x : |x| > K} — as K ^ oo 

n 

A sequence of distribution functions (F^) on R is tight if and only if the 
corresponding probability distributions form a tight sequence. 



Remarks 10.1.10 (a) You should verify the following directly from the definition: (/in)n is tight if and 
only if for every e > there exists a iiT > such that 

/i„[-j£:, K\>\-e for all n € N 

In other words, most of the mass of each lies on a single compact interval \—K, K\, which is the same 

for all ptn- 

(b) It should be clear that a single probability distribution on R is tight, i.e. that, for any e > there is K 
such that ix{x : \x\ > K} < s. Indeed, since [— n, n] t R, we have iJ,[—n,n] 1 1, and thus there is K such 

that fi[-K,K] > 1 - e. 

In the same way, it can be shown that any finite set {/Lti, . . . , jin} is tight: Choose Kj so that Hj [—Kj, Jj] > 
1 — e, and then define K := max{Ki, . . . ,Kn}. Clearly fj,j[—K,K] > fj,j[—Kj,Kj] > 1 — e for all j = 
1, . . . ,n. 

(c) If Hn is the distribution of a N{mn, l)-normal random variable, and the sequence of means {m„) is 
bounded, then (/tt„)„ is tight. 

(d) If fj,„ is the distribution of a A'^(0,n)-normal random variable, then (/u„)„ is not tight. 

(e) If fi„ = 5n, then {fj,„)„ is not tight. 



□ 



186 



Weak Convergence 



Theorem 10.1.11 (Helly-Bray) 

(a) Let Fn he a sequence of distribution functions on R. Then there 
exists a right-continuous non-decreasing function F : M ^ [0)1] 
and a subsequence F^^ such that 

lim Fn,{x) = F{x) 

K— »0O 

at every point of continuity of F . 

(b) If, moreover, the {Fn)n o-i"^ tight, then F is a distribution function, 
and Fn ^ F. 

Proof: (a) Enumerate the rationals (or any other countable dense subset of M), i.e. Q = 
{qn : n G N}. Since the sequence 

{Fn{qi))n n G N 

is bounded (because it lies in [0, 1]) it must have a convergent subsequence, by the Bolzano- 
Weierstrass theorem, i.e. there exists a sequence {n^'^)^ in N such that F (i)((j'i) converges to 
some value F{qi) as k ^ +oo. Since F (i)((?2) is bounded, it has a convergent subsequence, 

^k 

i.e. there exists a subsequence (Ti[i^^)fc of (nl^'*)fc F (2) (52) converges to some value F[q2) as 

k — >■ +00. Since a subsequence of a convergent sequence converges, and to the same value, we 
also have F {2){qi) F{qi). Keep going in this way: Since F (2) (93) is bounded, there is a 

subsequence {n^u^)k of {n^^^)k such that F (3)((j'3) converges to some value F{q^). Then, since 

"fc 

(n^^^)fc is a subsequence of {n^^^)k and thus also of (n^^^)fc) we must also have F (3) ((72) F{Q2) 

_ "fc 
and F (3)(gi) F{qi). 

k 

We now pick the diagonal of the sequences n[,*^ Define := ri"^^ . Note that the fe*^ tail 
{jim : m > /c} of {nk)k is a subsequence of {n^j^)j for each i > k. It follows that 

(Qi) F{qi) for all gj G Q 

Clearly F is an increasing function Q — > [0, 1], but it hasn't been defined for all a; G M. For 
every real number x, however, we can find a strictly decreasing sequence of rationals q [ x. 
Thus define 

F{x) = limF(g) 

qlx 

where q strictly decreases to x. 

Then F has all the required properties. (Note, however, that F{q) and F{q) may not be 
equal). F is clearly increasing To see that it is right-continuous, let .x G M and e > 0. Choose 
g G Q such that x < q, and F{q) < F{x) + £. This can be done by definition of F. Then 

F{x) < F{q) < F{x) + £ 

which proves right-continuity. 



Weak Convergence 



187 



Finally, suppose that F is continuous at a;. If e > 0, we may choose y < x such that 
F{x) — e < F{y) (by continuity). We may also pick g, r G Q such that y < r < x < q and 
F{q) < F{x) + £. Since 

F{x) - £ < F{r) < F{q) < F{x) + e 



and 

it follows that 



Fnir) < Fnix) < Fn{q) 



F{x) — e < liminf Fn^.{r) < lim sup F^^. (x) < F{x) + e 

k—*oo fc— »oo 



Thus Fn^{x) F{x) at every point x where F is continuous. 

(b) Wc need only show that the function F obtained in the Helly-Bray Lemma exhibits the 
correct end behaviour, since we already know that it is non-decreasing and right-continuous. 

But this is easy: For example, to show that lini F{x) = 1, proceed as follows. Let 

e > 0. By tightness, we can find a > such that Fn{K) — Fn{—K) > 1 — e. Moreover, 
K can be taken to be a point of continuity of F, because F has at most countably many 
atoms. Then Fn{K) > 1 — e for all n. Thus if x > K is a point of continuity of F, then 
F{x) > F{K) = lim Fnk{K) >l-e. 

fe— »+oo 

H 



Translating from the language of distribution functions to that of distributions, we obtain 

CoroUsiry 10.1.12 // (/Lt„)„ is a tight sequence of probability distribu- 
tion on (M,i3(M)), then it has a weakly convergent subsequence, i.e. there 
is a probability distribution fi and a subsequence {fink)k that jiny. M- 



10.2 Characteristic Functions 
10.2.1 Basic Properties 

In this section we introduce the characteristic function of a random variable. This is intimately 
related to the Fourier transform, although this kinship will become aparent only later in this 
section. 



Definition 10.2.1 (a) If is a probability measure on (M, B), then the 
characteristic function of /x is the function : M — > C defined by 



ifit) 



Atx 



li{dx) 



(b) If X is a random variable on a probability space (Jl, JF, P), then the 
characteristic function M — C of X is defined to be the characteristic 
function of the distribution fix of X. 



Remarks 10.2.2 (a) Clearly identically distributed random variables have identical characteristic func- 
tions. Wc shall soon prove the converse to this, i.e. that random variables with identical characteristic 
functions are also identically distributed. 



188 



Characteristic Functions 



(b) Suppose that X is a random variable on a probability space (O, J^, P). By the Change of Variables Theorem, 

we have 

^x{t) = [ e'*" dux = [ e"^ rfP = E[e'*^] 
Jr Jn 

The characteristic function of a random variable X is therefore frequently defined by 

(c) In a PDE's course, the Fourier transform F = J-{f) of a piece- wise continuous real-valued function function 
f{x) is generally defined by 



1 



dx 



Thus, if X is a continuous random variable with density fx, then the characteristic function of X is just 
the Fourier transform of fx (apart from a factor of \/27r). To be precise: 



(d) Of course, 



ipx = V2nJ^{fx) 

ipx{t) = j cos{tX) dF + i j sm(tX) d¥ 
= Ecos(tX) + iEsin(tX) 



□ 



Because the characteristic function of a continuous random variable is just the Fourier 
transform of its density function (up to a constant factor) the two have very similar be- 
havioural properties. Note, however, that the characteristic function of a random variable 
X is defined even if X is not continuous: Since |e**^| = 1, the random variable e**^ = 
costX + isintX is bounded, and hence integrable. 



Proposition 10.2.3 (Basic Properties of Characteristic Functions) 
Let ifx he the characteristic function of some random variable X. 

(a) ^x(O) = 1 

(b) \ipx{t)\ < IforallteR 

(c) (fx is continuous (in fact, uniformly continuous) 

(d) iPaX+b{t) = e^'^'ipxiat) 



(e) (p-x{t) = (px{-t) = (px{t) (= complex conjugate of(px{t)) 



Proof: (a) is obvious. 

(b) follows from the fact that | / / dfi] < J |/| d^. 

(c) Note that e^^-^ e^^-^ as u t. The result follows by the Lcbcsguc Dominated Con- 
vergence Theorem, as the family of e*""^ is dominated by an integrable random variable 



„iuX I 



1). Uniform continuity is now easy to see. 



(d) is straightforward. 

(e) follows from the fact that both (p^x{t) and tpx{—t) are equal to E[e~**^]. Now 



E[( 



= Ecos(tX) - zEsin(tX) = Ecos(tX) + iEsin(tX) = E[e»*-^] 



Weak Convergence 



189 



The following trivial result is nevertheless of great importance: 



Theorem 10.2.4 // X, Y are independent random variables on some 
probability space {il,J^,F), then 

ipx+Y{t) = ^px{t)(pY{t) 



Exercise 10.2.5 Prove the preceding theorem. 

□ 

Examples 10.2.6 Wc give hero some examples of characteristic functions of random variables: 

(a) Normal distribution: Assume that X is normally distributed with mean and variance 1. Then it has 
density function 

1 

Thus 

However, 1"^ sin tx is an odd function of x, and thus the second integral on the right is 0. It follows 
that 



(t) = . / e~ ^ cos tx dx H — -i= / e~ ^ sin tx dx 

V27r J — oo yllK J — oa 



<px (t) = . / cos tx dx 

\/2ir J -oo 

Differentiating with respect to t yields 

v'x{t) = . / —xe~^ sin fa; dx 

v27r 7-00 

If we integrate this integral by parts, we obtain 

ip'x (t) = ^= / ^ cos tx dx 

vStt y-oo 



= -tipx(t) 

This is a first-order separable differential equation with solution 

_ 

ipxit) = ifix{0)e 2 

Since ipx{0) = 1, we thus obtain 

_ 

ipx{t) = e 2 

Now if y = aX + fj,, then y is a normally distributed random variable with mean fj. and variance a^. It 
follows that 

ifirit) = (fax+^iit) = e'^^^px{cFt) 

(b) Suppose that X is a BernoulU variable, with P(X = 1) = P(X = -1) = i. Then 

ipx{t) = Ee"^ = ^ + ^ = cost 

(c) Suppose that X is Poisson distributed with rate A. Then 



oo 

/,\ -A itk 

Vx{t) = 2_^^e e 
fc=i 



oo l\„it\k 

■ e 



190 



Characteristic Functions 



(d) If X is uniformly distributed over [a, b], then 



it{h — a) 



□ 



Characteristic functions and moments are related in the following way: 



Proposition 10.2.7 Suppose that a random variable X has an n*'* mo- 
ment, i.e. that 

E|X|" < +00 

Then its characteristic function ^px{t) has a continuous derivative of 
order n, and 

<^^)(t) = E[(zX)V*^] 



Proof: Note that E[X"e**'^] exists if and only if E|X|" < +cxd, i.e. if and only if X has an 
j^th jxioment. 

We tackle first the case n= 1. Fix f G M, and suppose that EX exists. Then 



lim 



<Px{t + h) - (pxjt) 
h 



lim / ( 

h-^Qj 



,itx 



h 



Hx{dx) 



But 



h 



< \t\ 



(You may wish to consult the "circle inequality" proved later in this chapter.) Thus we may 
apply the Lebesgue Dominated Convergence Theorem to deduce 



lim 



(fxjt + h) - (pxjt) 
h 



e**^ lim ^ 



ith 



/i-+0 k 



Hx{dx) 



which proves 



ip'x{t) = iE[Xe 



itx^ 



This yields the result for n = 1. Repeating the argument will establish the result for higher 
n. 

H 

It follows that one may calculate the moments of a random variable directly from its 
characteristic function. Putting t = in the above yields: 

Proposition 10.2.8 If X is a random variable with characteristic func- 
tion <fx{t), then 

EX" = r>5'^(0) 



Weak Convergence 



191 



10.2.2 Inversion 

The main result about characteristic functions is that random variables whose characteristic 
functions are equal are identically distributed, i.e. (fx = fY if and only if fix = y^y- The 
distribution of a random variable can be recovered from its characteristic function. This result 
follows from the following theorem: 



Theorem 10.2.9 (Levy's Inversion Formula) 

Let if he the characteristic function of a random variable X, which has 
distribution ji and distribution function F. If a < b eM, then 



lim — / (p(t) dt 

r-»+oo 27r j_T it ^ ' 



= lK{a})+Ka,b) + ^fi{{b}) 

^\[F(b) + F(b-)]--^[F(a) + F(a-)] 



1- 

r— +00 27r J_ 



Remarks 10.2.10 Note that if F is continuous at a and b, then /^({a}) = = In that case we 

have 

T —ita —itb 

//(o, 6) 
F{b) - F{a) 

□ 



It follows from Levy's Inversion Formula that the distribution of a random variable X is 
determined by its characteristic function. For suppose that X, Y are random variables which 
have the same characteristic function (p. Let fix be the distribution of X, and let fly be the 
distribution of Y. Then for any real numbers a < b, we must have 

\f^xi{a}) + fixia, b) + ^fixi{b}) = ^fiYi{a}) + /xy(a, b) + ^/xy({6}) 

There are at most countably many real places where a distribution function can be discontin- 
uous. It therefore follows that there are sequences a„ J, a and bn ^ b such that both Fx,Fy 
are continuous at each a„ and 6„. It follows by (a) that fix{cLn,bn) = f^Y{an,bn) for each n. 
Now since {an, bn) T (a, b), by continuity of measure we have 

Hx{a,b) = HY{a,b) 

Thus fix,I^Y are finite measures which agree on all open intervals. But the open intervals 
form a 7r-system which generates the Borel algebra, and thus fix niust agree with fiy on 
every Borel set. We have shown: 

Theorem 10.2.11 Two random variables have the same characteristic 
functions if and only if they have the same distribution. 



Examples 10.2.12 (a) If a random variable X has a characteristic function ipx{t) = , then X must 
be normally distributed with mean and variance 1. This follows from Example 7.3.8 and the Levy 
Inversion Formula. 



192 



Characteristic Functions 



(b) Suppose that X, Y are independent normally distributed random variables with mean and variance 1. 
Then 

(px+Y{t) = ipx{t)ipY{t) = e 

Thus X + Y has the same distribution function as a normally distributed random variable with mean 
and variance 2. Hence X + Y is & normally distributed random variable with mean and variance 2. 

(c) If ipx{t) is real-valued (as opposed to complex-valued), then 

ip^x{t) = (fixit) = (px{t) 

Thus (fix is real if and only if X and —X are identically distributed, i.e. if and only if X is distributed 
symmetrically about the origin. As a consequence, all odd moments must be zero (if they exist). 



□ 



Before we prove that Levy's Inversion Formula holds, we need a definition and some 
lemmas: 



Definition 10.2.13 Define the function ^(r) for T > by 




/ sm X 


/ ax 


'o X 


Also define the function sgn(a;) by 






' 1 if X > 


sgn(a;) = < 


if a; = 




-1 if X < 



The function x ^ sinx itself is not Lebesgue integrable over the reals, because its positive 
and negative parts both integrate to infinity (exercise!). Nevertheless, ^lim x~^smx dx 

exists, because the sequence 

x~^ sinx dx j 

is alternating and goes to as n — +oo. It is important to know what this limit is: 
Lemma 10.2.14 

/ smx IT 

lim / dx = — 

T-*+oo Jo X 2 



Proof: Note that 



Jo 



[1 -e~"^(u sin r + COST)] 



1 + 



which may be obtained by two successive applications of integration by parts. Now the 
function e^"^ sinx is integrable over (0, T) x (0, +00), because the integral of its modulus is: 



f I 

Jo 



' sinxl du dx 



= / --'I 

Jo 



sinxl dx <T < +00 



Weak Convergence 



193 



using \x sinxl < 1. We may therefore apply Pubini's theorem: 



/ dx = sin X 

Jo X Jq [Jo 

T 



e""^ du 



e sin X dx 





du 



dx 
du 



l + «2 Jo 1 + U^ 



oo g-wT 



{u sin T + cos T) du 



Clearly /~ ^ = f . Furthermore 



< 



1 + ^2 



(u sin T + cos T) du 



poo 













lusinT + cosTl du 



POD 

< e~'''^ {u + 1) du asT^oo 
Jo 



by the Lebesgue Dominated Convergence Theorem, so that i+u^ i'^ sin T + cosT) ^ 
as r — +00. This is what we needed to establish. 



Another technical result that we shall need is the following: 
Lemma 10.2.15 



1 qHx 



27r J_T it 



dt 



sgnix) S{\.r\T) 



Proof: 



/: 



cos tx + i sin tx 
it 



dt 



sintx 



t 



dt 



, cos tx . . . If 1 sin tx . , 
because — r: — is an odd function of t. Moreover, using the fact that is even and a 



it 

change of variables 1 1— ^ , we obtain 



sin tx 
t 



dt = 2sgn(a;) 



T\x\ 



smu 



du 



u 



from which the result now follows trivially. 



The final lemma will be useful in giving upper bounds for certain functions: 

Lemma 10.2.16 ("circle inequality") 
For u,v eM. with u < v we have 



e^-e™ < v-u 



194 



Characteristic Functions 



Proof: This follows from the fact that 



/•V 

/ ie'' 

J u 



dt 



< I ke**| dt = v-u 



(or more simply from a diagram: Just draw e'" and e*" on the unit circle in the complex plane 
and think about it.) 



We may now prove the Levy Inversion Formula: 

Proof of Levy's Inversion Formula: 

If a < 6 in M, and if < T < +oo, then 



2-ir J_x it 

fT „—ita 



(f{t) dt 



2tt 

1 

2^ 



it 



e^*^ H{dx) ) dt 



it 



dt I iJ,{dx) 



by Fubini's Theorem, since the integral of the absolute value is finite: By Tonelli's Theorem, 

^itix—a) ^it(x—b) 



27r / 



< 



[-T,T] X 
1 

1 



-T 
T 



it 

^it{x—a) ^it{x—b) 



it 



(A (g) iJ-){dt, dx) 
dt ii{dx) 



2lT 



{b - a)T 



|6 — a| dt\ ii{dx) 



using the inequality on the unit circle above. 
Using one of the lemmas above, we obtain 

rT ^itix-u) _ ^-,il{.v-b) 

27r J ^rj. it 



dt 



sgn(a; - a)S(\x - a\T) - sgn(x - h)S(\x - h\T) 



TT 



Now as ?7 t +00, S{U) |, and thus 



lim 



sgn(x - a)S{\x - a\T) - sgn(a; - b)S(\x - h\T) 



TT 



' if X < a 
1 -f 

- II X = a 

1 if a < X < 6 

- if X = 6 
if 6 < X 



Weak Convergence 



195 



Hence 



It follows that 
lim 



sgnjx - a)Si\x - a\T) - sgnjx - b)Si\x - b\T) , , 

= 2Ha} + Ha,b) + 2Hb} 



r-»+oo 27r it 



lim 



/ 



ip{t) dt 



sgn(a; — a)S{\x — a\T) — sgn(a; — h)S{\x — h\T) 



fj,{dx) 



J 



^.^ sgn(x - a)S{\x - a\T) - sgnjx - b)S{\x - b\T) 

T-*+oo TT 



= ^M(W) + M«,6) + ^Mm) 

where we used the Lebesgue Dominated Convergence Theorem to take the limit inside the 
integral. This proves the result. 
Note also that 

I [F{b) + F{b-) - F{a) - F{a-)\ = \ [/.(a, b\ + ^[a, b)\ 

H 

An extremely useful (but slightly weaker) version of the inversion theorem is the following: 



Theorem 10.2.17 Let be the characteristic function of a random 
variable X, which has distribution fj, and distribution function F. If 



\(p{t) \ dt < +00 



then X has a continuous probability density function fx, and 
fx{x) = ^ I e-*V(t) dt 



Exercise 10.2.18 We prove Thm. 10.2.17 

(a) Show that f^^- ^ rft < |6 — a| Jj^ and conclude that 

K (with respect to t). 

(b) Explain why 

F{b) - F{a) < 



-<p{t) is integrable over 



and why this, together with the fact that f^ \'^>{t)\ dt < +00, immediately implies that F is continuous. 

(c) Explain why 

m-Fia) = -l ^it)dt 

for all a < 6 e R. 

(d) Deduce that that for all x and all h, we have 



F{x + h)- F(x) _ 1 f e 



-itx ^ — it(x-{-h) 

ith 



-ifi{t) dt 



196 



Characteristic Functions 



(e) Use the Lebesgue Dominated Convergence Theorem to conclude that 
as required. 

(f) Finally, explain why fx must be a continuous function. 

□ 

Remarks 10.2.19 If X is a random with a continuous probabiHty density function fx, then 
we have the following "duality": 

^x{t) = [ fxix)e''^ dx 
fx{x) = ^ j ^x{t)e-''^ dt 

Thus the density and characteristic functions are inverses of each other under Fourier trans- 
forms. 

□ 

10.2.3 Weeik; Convergence and Chciracteristic Functions 

Our next result proves that weak convergence of measures is equivalent to pointwise conver- 
gence of characteristic functions: 

Theorem 10.2.20 (Levy Continuity Theorem) 

Let fj,n be probability measures on (R,B), and let ipn be the associated 
characteristic functions. Suppose that {(pn{t))n converges for every t G 
M, and that 

ip{t) := limifinit) 

n 

If ip is continuous at t = 0, then ip is the characteristic function of a 
probability distribution fi, and fin 

Proof: Suppose first that (/Xn)n is tight. Then by the Helly-Bray Lemma there is a subse- 
quence {Hnf,)k s-iid a probability distribution fi such that /x^^. fi. Now e**^ = costx + i sin tx 
is continuous, and hence, by definition of weak convergence, we have f e**^ fin^idx) 
J ^itx ^(^dx), i.e. ^ukit) ^i't), where ip is the characteristic function of But as 
fnit) ip{t), the same is true for any subsequence. Hence ip{t) = ip{t). 

So far, we know only that fink ~^ A*) but we would like to show that /i„ /x. It is easier 
to argue with distribution functions: fin ^ fJ- does not converge weakly to fi, then Fn F 
(where, of course, Fn,F are the distribution functions of fin,fi), which means that there is 
a continuity point x of F such that -F„(x) -/^ F{x). It follows that there is a e > and a 
subsequence {Fmj)j such that \Fmj{x) — F{x)\ > e for all j. Applying Helly-Bray Lemma to 
this subsequence, we obtain a subsubsequence {Fmj^)i which converges weakly, by tightness, 
to some distribution function G, with characteristic function ^pG- But since ipnit) i^it), we 
must have ipm^ {t) — >■ tp{t), and hence ipcit) = ip{t) = ip{t). Thus G and F have the same 



Weak Convergence 



197 



characteristic function, and hence G = F, hy Levy's Inversion Theorem. But this leads to a 
contradiction: As Fmj{x) G{x) and G{x) = F{x), we have Fmj^{x) — > F{x) as I oo. Yet 
\Fmj{x) — F{x)\ > e for all j G N, and thus we have both 

\Pmj^ {x) — F{x) \ > e for all I and Wmj^ {x) — G{x)\ < e for all large / 

which is impossible. 

It just remains to show that the (/Xn)n are tight: Now if (5 > 0, then 

5-^1 1 - ipn{t) dt = 5^^ I I I- e^*^ f^nidx) dt 
J-s J-sJr 

= [ 5-^ t I- e^*^ dt nn{dx) 
Jr J-s 

f ( sinJxN , , , 
> iin{{x : \x\ > 26-^}) 

Here we used Fubini's Theorem to change the order of integration (which is permitted, since 
1 - e**^ is bounded by 2). We also used the fact that 1 - > 1 - |^ and that 1 - > i 
when \x\ > 2S~^. 

Now clearly il;{0) = lim(^„(0) = lim„ 1 = 1. As ^ is assumed to be continuous at t = 0, 
there is for every e > a (5 > such that 

S-^ J 1 - (p{t) dt<e 

Since V^{t), the Lebesgue Dominated Convergence Theorem implies that there exists 

an such that 

J 1 - (fn{t) dt<e for all n > iV 

It follows that 

fin[-2d-\2d-^] >l-e 
for all n > A?^. Also pick Ki,K2. . . , Kn > such that 

IJ,n[-Kn,Kn\ > I - e forn=l,...iV 

If K := ma^{25-^,Ki,K2, Kn}, then 

Hn[-K, K]>l-e for all n G N 

proving that {fJ,n)n is indeed tight. 

H 



198 



The Central Limit Theorem 



Now if fin, are probability measures on (M,i3(R)), with characteristic functions iPmV, 
and jj,n then we must certainly have / e**^ fin{dx) J e**^ /J-idx), by definition of weak 
convegernce, as e**^ = costx + isintx is continuous. Thus 



l^n IJ- implies <fn{t) ^{t) for alH G M 

Conversely, if (pn{t) '^if) for all t € M, then, by the previous theorem, ^ ^, as 99 — 
being a characteristic function — is continuous, and thus continuous at i = 0. We thus have: 

Corollary 10.2.21 Suppose that HmfJ- o,re probability measures on 
(M,i3(M)), with characteristic functions <Pn,f- Then 

Hii /( ■//' and only if -Puij) ^ •-pij) for all I £ M 



In the proof of the Levy Continuity Theorem, we proved the following useful inequality: 

Proposition 10.2.22 // (p is the characteristic function of a probability measure /i on M, 
and if S > 0, then 



5-^ j l-^{t)dt>ii{{x:\x\>25-^]) 



□ 



10.3 The Central Limit Theorem 

The Central Limit Theorem is one of the fundamental results in mathematics, and is easy to 
prove with the machinery set up in the previous two sections. It states that if X„ is a sequence 
of independent identically distributed random variables with mean fi and variance cr^, then the 
distribution of the fractions {Xi ■ ■ • + — n\i)jo\fn tends to a standard normal distribution 

(with mean and variance 1). Why the ^Jnl Basically, if we define 8^ = X\-\ h then 

it is clear that 5„ has mean n/x and variance na^ . Thus as n — ^ +00, var(S'„) — +00 as well. 
However, Sn/y/n has variance var(S'„)/n = 0"^. 

We can conclude that if n is sufficiently large, then Sn is approximately normally dis- 
tributed with mean n^i and variance ncr^. Thus the sums of identically distributed random 
variables tend to become normally distributed. 

Here is a concrete exercise which contains all the important elements of the proof of the 
Central Limit Theorem: 

Exercise 10.3.1 (a) Suppose that X is a random variable with 

P(X = l) = \ =P(X = -1) 
and that ipx is characteristic function of X. Show that 

ifx {u) = cos u 

(b) Now let Xfi (for n G N) be independent random variables, all with the same distribution 
as X in (a). For n e N, define random variables G„ by 



n 



Convergence of Random Variables 



199 



Show that the characteristic function (pG„ of G„ is given by 

<pgj«) = (cos 

(c) Use a Taylor expansion of cos x about x = to show that 

cos ^ = 1 + ^ (-i + e(^)) where e{h) ^ as ^ ^ 

(d) Now define k := k{^) := -| + so that 

k- 



^ as n —>■ oo and cos ^ = 1 + — 

2 -^n n 



By considering ln(/?(3^(tt), and with the aid of a Taylor expansion of ln(l + a;) about x = 0, 
show that 

1 2 

lim ipGni'^) = 

n— »oo 

(e) Now explain why we may now deduce that G„ Z, where Z is a standard normal 
random variable. 

□ 



Theorem 10.3.2 (Central Limit Theorem) 

Let N be a standard normally distributed random variable, with mean 
and variance 1. Let be a sequence of independent identically dis- 
tributed random variables with mean fi and variance < +oo. Define 
Sn = Xi + X2 + • • • + Xn and set 



Gn 



Sn - njJL 



Then 



Gn^N 



Proof: It clearly suffices to prove the theorem for independent identically distributed X„ 
with mean zero and variance 1, because if the X„ have mean ^ and variance cr^, then Xn '■= 
{Xn — n)/a have mean zero and variance 1, and then 

Sn ~ Sn 

So let {Xn)n be a sequence of independent random variables with mean zero and variance 
1. Define 

Xi + X2 + • • • + X„ 



Gn — 



Then because the X^ are independent, we can say that 

n 



k=l 



200 



The Central Limit Theorem 



where <p is the common characteristic function of the Xj,. Now if we consider a Taylor 
expansion for if about t = 0, we'll obtain something like 

^{t) = ^{0) + ^'{0)t + yiO)t'^ + e{t)t^ 

where e{t) — as t — 0. Taylor's Theorem applies because X has a second moment, so that 
ip is twice differentiable. Now examining the moments, we obtain 

(p{0) = 1 (p\0) = (/?"(0) = -1 

Putting this back into the Taylor expansion for (p yields 

ip{t) = 1 - + e{t)t'^ = ! + [-! + e{t)]t'^ = 1 + k{t)t^ 

where k{t) — > — 2 as i ^ 00. 
Thus 

Now recall that the first order Taylor expansion of ln(l + z) about x = yields 
ln(l + z) = z + e{z)\z\ where £(2;) ^ as 2; — > 

Put z := — ^ — , so that z — as n — 00, to obtain 

ln<^Gjt) = nlnh + ^^j 

= nln(l + z) 
= n{z + e(2;)|2;|) 

= k{^)t^+s-{z)ki^y 

limln^Gjt) = ]im k{^)t^ +lim e{z)k{^)t^ = -^i^ 



and hence 



It therefore follows that 

V'Gji) ^e-*'/2 for alH 

By Levy's Convergence Theorem, the distributions /i„ of Gn converge weakly to a distribution 
whose characteristic function is e~* But this is the characteristic function of the standard 
normal distribution, so by Levy's Inversion Formula it follows that G„ weakly converges to a 
standard normally distributed random variable. 

H 



Chapter 11 



Conditional Expectation and 
Martingales 



11.1 Information and Expectation 

Because a thorough understanding of conditioning is absolutely fundamental to the develop- 
ment and understanding of martingales, stochastic integrals and the tools of arbitrage pricing, 
we take another, gentler, look at how information is organised and used in probability theory. 
It is clear that new information will cause a re-evaluation of probabilities, and thus the ex- 
pected values of random variables. Information does not arrive all at once, but in dribs and 
drabs over time. Information is contained in tr-algebras, and the flow of information will be 
modelled by an increasing chain of a-algebras. Given a random variable such as the stock 
price St at some later time T, we have an initial expectation EqSt of what the stock price 
will be, but as we observe the stock price over the interval [0, T], our expectation changes: We 
get a sequence EtS^ of expectations at time t. At time T, wc know the stock price exactly, 
and so KtSt = St, i.e. the expected value at time T is the actual value. It will turn out that 
under an equivalent martingale measure we have 



i.e. at time t, the (risk-neutral) expected value of discounted St is just the discounted value 

of St- Here the discounting is done back to the period t = 0. 

The right tool to deal with these problems is the notion of conditional expectation, due to 
Kolmogorov. This is regarded by many as the central concept of probability theory. 

11.1.1 Conditioning on an Event 

The first, and simplest, case that we consider is that of the conditional expectation E(X|yl.) of 
a random variable given an event A. Recall that EX is just the Lebesgue integral of X with 

respect to the probability measure P, and that this essentially amounts to taking a weighted 
sum of the values of the random variables, where the weights are the probabilities: In the 
discrete framework, with = {ujn '■ n = 1,2,...}, we have 



Et-Sr = St 



E{X) = XdF = \F{{u;n})X{un) 






n 



201 



202 



Conditioning on an Event 



This makes sense from the frequentists' point of view: If we perform the random experiment 
a large number A'' of times, the outcome ujn occurs roughly A^P({u;,i})-many times. Thus we 
observe a value of X{ujn) roughly iVP({a;„}) times. The average value of X is just the sum 
over all experiments of all the values of X divided by the number of experiments, i.e. 

^^iVP({a;„})XK) =EX 
n 

Note that we can also write this as 

It is just the weighted average of X over the set 0,. It will often be useful to take the weighted 
average of X over a subset B of O. To that end we define: 

E{X; B)= I X dF = E[XIb] 
Jb 

(Recall that Jg X dF is defined as J XIb dF. Since Is equals 1 on B, and outside B, this 
corresponds to finding the weighted average of X, but only for those values for which the 
outcome lies in B.) 

In the discrete framework we obviously have: 

E(X;5)= EP(M)X(a;) 

Note also that EX = E{X; Q). 

Suppose now that we are given the information that the event A occurred. In that case 
the only possible values of X that we will observe are of the form X(lj), u) & A. We will 
therefore not generally be able to observe all the possible values of X, but just those X{io) 
for which co E A. 

How should we define ]E(X|A), the expected value of X given A? Let's try the frequentists' 
approach for guidance: Suppose that a superlarge number M of random trials is performed. 
The event A won't occur every single time, but if M is superlarge, there should still be a 
large number A'^ MF{A) of times in which the outcome does lie in ^. ^ If w G A, we will 
therefore see an outcome -^(i^) roughly MP({a;})-many times. Thus the average value of X 
given that A occurs is simply the sum over all experiments for which A occurs, divided by 
the number of times that A occurs. Thus we ought to have 

E{X\A) = 1 E MP({a;})X(a;) 

_ ^ n{u;})X{u) 
^ P(A) 

_ E{X;A) 
~ P(A) 

We therefore define: 



1 Unless V{A) = 0. 



Information and Expectation 



203 



Definition 11.1.1 If X is a random variable and if A is an event with F{A) ^ 0, then the 
conditional expectation of X given A is defined by: 

^ ' ' F{A) 

□ 

Example 11.1.2 Suppose that X is a random variable and that A is an event with positive 
probability. Before we know that A has occurred, we have a best estimate of what x will 
be, namely EX. If we are told that A has occurred, however, we will revise our expectation. 
For example, the expected value if a fair die is rolled is 3.5. However, if we are told that the 
outcome is even, we revise our estimate: the expected value is now 4. If we are told that 
the outcome is odd, the expected value is 3. Essentially, when we are told that an event A 
has occurred, we revise our probability measure: Each outcome outside A is now assigned 
measure 0, whereas each event inside A has its probability scaled up by dividing by P(yl): 
P(5|74) = for all B C A. We use the new measure IP(.|^) on the sample space A to 
calculate our revised expectation: The expectation of X given A is 



K[X; A] 



F{A) 

Note that F{B) = KIb for every event B. Generalizing, we can define the conditional 
probability of the event B given A by 

F{B\A)=E{Ib\A) 

□ 



11.1.2 Conditioning on a Random Variable 

Next we tackle a slightly more complicated version of conditional expectation, namely the 
conditional expectation of the random variable X given the random variable Y, denoted 
E(X|y). The idea is that if we know the values of the random variable Y, this may give us 
information about the random variable X, unless they are independent. 

Suppose first that we are working with a discrete probability space = {tOn '■ n- = 
1,2,...}. Let {um : m = 1, 2 . . . } be the set of values of Y, where we assume that F{Y = 
Un) / 0. We thus have a sequence of events {Y = ym}, and for each of these we can calculate 
the conditional expectation E(X|F = j/^). Instead of regarding each of the 

E(X|y = yi), E(X|y = ys), E(X|y = ys), • • ■ 

separately, we define instead a new random variable E(X|y) as follows: 

F.{X\Y){uj) = E(X|y = Y{uj)) 
i.e. E{X\Y){uj) = E(X|y = y„) if Y{u) = ym 



204 



Conditioning on a Random Variable 



Example 11.1.3 A fair die is rolled, and you get an amount equal to the outcome co, but 
only if it is even. Let X be your winnings. Let Y, Z be defined as follows: 



1 if a; is even 
else 



Z{u;) 



0.1234 if Lo is even 
10^ else 



We calculate the random variable E[X|y]: 

( E[X; Y = l] 

E[X|y](u;) = 



¥{Y = 1) 
E[X; Y = 0] 



if Y{oj) = 1 
if Y{uj) = 



I p(y = 0) 

J 4 if w is even 
I if a; is odd 



Now compare E[X\Y] and E[X|Z]: 



E[X\Y]{u;) 



( E[X;Y = 1] 

p(y = 1) 

E[X; y = 0] . 



if Y{uj) = 1 



I p(y = o) 

f E[X;{2,4,6}] 

P({2,4,6}) 
E[X;{1,3,5}] 



2,4,6}) 
' E[X; Z = 0.1234] . 



= < 



F{Z = 0.1234) 
E[X; Z = 10^] 



if Y{lu) = 
if w G {2,4,6} 
if w G {2,4,6} 

if Z(w) = 0.1234 

if Z{lo) = 10^ 



F{Z = 10^) 
E[X\Z]{uj) 



Even though Y, Z are quite different, we see that E[X|y] = E[X|Z]. This is because the 
conditional expectation E[X |y] depends not on the values of of Y, but on the information in 
y, i.e. it depends on a{Y). It is obvious, in this case, that 

a(y) = a({l,3,5},{2,4,6}) = a(Z) 

i.e. y, Z contain the same information. 



□ 



Information and Expectation 



205 



Now note that (t(Y) is obviously generated by the sets partition {Y = 2/1}, {Y = 1/2}, ■ ■ ■ , 
and that E{X\Y) is constant on each block (i.e. that E{X\Y){iOi) = E{X\Y){lj2) if u;i,u;2 
belong to the same block; this happens if Y{uji) = Y{lo2)). It follows that 

E[X|y] is (T(y)-measurable 

The intuition behind this fact is simple: If we know Y, we can calculate E(X|y). Thus all 
the information needed to calculate E(X|y) is contained in a{Y). We must therefore have 
a{K{X\Y))Ca{Y). 

Now each element of (t(Y) is simply a finite union of the sets {Y = ym} which make up 
the partition which generates cf{Y). Since E[X|y] is a random variable, we can integrate it. 
Note that on each block we have: 

E[E[X\Y];Y = y^] = E[X\Y = y^] ■ F{Y = y^) 

_ m;Y = y^] 

" nY = ym] ^L^-^-J 

= E[X;Y = ym] 

[Since E{X; A) = J^X dPis the "average" of the random variable X over the set A, it follows 
that X,E[X|y] have the same average values over each set {Y = ym}- In different notation: 

f E[X\Y] dF= [ X dF 

J{Y=ym} J{Y=ym} 
Again, this simply states that the random variables X and E[X|y] have the same average 
over any set in the partition that generates cr(Y). Since /p^uF2 ^ dF = Jpi ^ dF + J^^ X dF 
if Fi,F2 are disjoint, it now follows readily that X, E[X|y] have the same average over any 
set in cr(y): The elements of cr(y) are just disjoint unions of the blocks {Y = ym}- 
Thus we have shown the following: 

Proposition 11.1.4 Let X be a discrete integrable random variable" 
and let y be an arbitrary random variable. Then conditional expec- 
tation E[X|y] is a random variable with the properties that 

(a) E[X|y] is (7(y)-measurable. 

(b) X4 E[X\Y] dF = J^X dF for all ^ G a{Y). 

□ 



"i.e. EX exists. 

The above proposition makes it clear that E[X|y] depends on Y only through (t{Y), and 
not directly on the values of Y, i.e. that 

Corollary 11.1.5 Suppose that X,Y,Z are random variables, with Y,Z discrete. If a{Y) = 
a{Z) thenE[X\Y]=E[X\Z] a.s. 

□ 

We have not, as yet, proved that E[X|y] exists for general Y, but only for discrete Y (i.e. 
y with at most countably many values). However, as the definition of E[X|y] depends on Y 
only through a{Y), we proceed to generalize: 



206 



Conditioning on a a-Algebra 



11.1.3 Conditioning on a cr— Algebra 



Definition and Theorem 11.1.6 (Kolmogorov) 

Suppose that (Q,!F,¥) is a probability space and that X is a random 
variable in £^(0, P). Let ^ be a sub-a-algebra of J^. Then there 
exists a random variable Z such that 

(i) Z is (^-measurable. 

(ii) Z e C\n,J^,F), i.e. E{\Z\) < +oo. 

(iii) For every set G G ^, we have 



E[Z; G] = E[X; G] i.e. 



/ ZdF= [ X 

Jg Jg 



Moreover, if Z' is a random variable satisfying (i),(ii),(iii), then Z = Z' 
a.s. Any random variable Z with the properties (i),(ii),(iii) is called a 
version of the conditional expectation of X given Q. We write 

Z = E[X\g] a.s. or Z = E^X a.s. 



Definition 11.1.7 We define conditional expectation w.r.t to general 
random variables in the following manner: 

E[X\Y] :=E[X\a{Y)] 



To prove that conditional expectations exist, we give a geometric argument, involving 
approximation in a Hilbert space. 

Before we start the second proof, recall that a Hilbert space F is a vector space which 
is equipped with an inner product, which we will denote (^1,^2)- Now an inner product 
automatically induces a norm (length), and angle 



\\v\\ = {v,vy. 



cos ( 



{VI,V2) 
l'"l||||^2| 



Here 6 is the angle between vi and V2- We say that vi,V2 are orthogonal if {vi,V2) = 0. 
Hilbert spaces are also complete, i.e. every Cauchy sequence in V converges (to a vector in 
V). 

Suppose that W is a complete subspace of V. We then have the notion of orthogonal 
projection onto W. Given any vector v E V, there exists adecomposition 

V = yW -\- y-^ 

a unique vector w with the following properties: 

(1) vll G W. 

(2) ± W, i.e. {v^,w) = for all weW. 

(3) \\v - = inf{||?; - w\\ : w e W}. 



Information and Expectation 



207 



Thus w'l is the vector in W which is the best approximation of v: It Hes closer to v than any 
other w G W. is called the orthogonal projection of v onto W. 

Recall also that £^(r2,jF, P) is a Hilbert space, with inner product {X,Y) = EXY and 
induced norm 1 1-^^112 = (]EX^)2. 

Proof of Thm. 11.1.6: First assume that X e >C2(J^,jr,P). Note that ^^(J), P) is a 
closed subspace of £^(0, jr,P), and thus there exists a decomposition 

X = Z + Y where ZeC'^{n,g,F) and YJ-Ji:'^{n,g,F) 

Moreover, \\X - Z\\2 = mi{\\X - U\\2 : U G Ji:'^{n,g,F)}. Now Z is clearly ^-measurable. 
Also, if G G a, then Iq G £^(0, Q) and so F _L Iq. Hence 

E[Z; G] = {Z, Ig) = {X, Ig) = E[X- G\ all G G ^ 

It follows that Z = E[X\g] a.s. 

For X G C^{^,J-,¥), we use an approximation argument. First assume that X > 0, 
and for n G N, define X„ := X A n. Then X^ T X, and each X^ G C^iVl^J^^F). By 
the above, there arc Zn G C'^{p.,Q ,F) such that Z„ = E[X„|t/] a.s. Next, note that if 
n < m, then X„ < X„i, and thus Zn < .^m a.s.: For if e > 0, and Gs ■= {Zn — Zm > s}, 
then Ge G G, so that < £ • PG^ < - G^] = E[X„ - X^; G^] < 0. Hence 

FGe = for all e > 0. Now {Z^ > Zm} = Uken^k, and hence P(Z„ > Z^) = 0, i.e. 

k 

Zn < -^m a.s., for each pair n < m. Taking the (countable) intersection over all such pairs 

yields P(Z„ is an increasing sequence) = 1, i.e. the sequence {Zn)n is increasing a.s. Define 
Z = limsupjj Zn- Then Z is ^-measurable, and Zn \ Z a.s. If G G ^, then by two applications 
of the MOT we have 

E[Z; G] = limE[Zn; G] = limE[X„; G] = E[X; G] 

n n 

Hence Z = F\X\g\ a.s. 

The existence of E[X|^] for integrable X follows by decomposition into positive and neg- 
ative parts. 

The a.s. uniqueness of E[X|t/] is straightforward: If Z, Z' are two versions of E[X|^], and 
e > then G^ := {Z-Z' > e} G and hence < e-PG^ < E[Z-Z'; G^] = E[X-X; G^] = 0. 
Arguing as above, we see that F(Z > Z') = 0. By symmetry, F{Z' > Z) = as well, i.e. 
Z = Z' a.s. 



208 



Conditioning on a a-Algebra 



Theorem 11.1.8 (Properties of Conditional Expectation) 
The following are true for random variables on a, probability space 
P) whenever the expressions occurring inside a conditional ex- 
pectation are integrable. 

(a) E[E[X\g]] = EX a.s.; 

(b) If X is Q -measurable, then E[X\Q] = X a.s. 

(c) LINEARITY: E[aiXi + 02X2!^] = aiE[Xi|g] + a2E[X2|a] a.s. 

(d) POSITIVITY: IfX>0, then E[X\g] > a.s. 

(e) cMCT: //O < X„ j X, then E[Xn\G] T E[X\g] a.s. 

(f) cFATOU: If X^ > 0, i/ien E[liminf„ < liminf„E[X„|g] a.s. 

(g) cDCT: // < Y (all n e N) for some integrable Y, and if 
Xn X, then E[Xn\Q] E[X\g] a.s. 

(h) PROJECTION: E[X ■ E[Y\g]] = E[E[X\g] ■ Y] = E[E[X\g] ■ 
E[Y\g]]. 

(i) IfY is g-measurable, then E[YX\g] = YE[X\g] a.s. 

(j) TOWER: Ifn C g, thenE[E[X\g]\H] = E[E[x\n]\g] = E[X\n]. 
(k) INDEPENDENCE: // H is independent of a{X) V g, then 

E[x\gyH] = E[x\g] a.s. 



Exercise 11.1.9 Prove Thm. 11.1.8(a), (b), (c), (d). 

[Hint: (c) means that if 1^ are versions of E[Xfe|t/] for k = 1,2, then aiYi + 02^2 is a version 
of E[aiXi + a2X2|a]. 

For (d), let Z = E[X\g] a.s., and note that {Z < 0} = {Jneni^ < ~n} ^ ^-l 

□ 



Proof of Thm. 11.1.8(e)-(k): 

(e) : Suppose that < Xn T X, and define := E[X„|^] a.s. By (d), is increasing a.s. 

Define Y = limsup„l^, so that Y is t/-measurable and y„|ya.s. IfGG^, then by the 
MCT, E[X; G] = lim„ E[X„; G] = lim„ E[F„; G] = E[Y; G]. Hence Y = E[X\g] a.s. 

(f) : Let Zn '■= inffc>^X^ a.s., so that Zn T liminf„X„. Since Z^ < Xj^ whenever k > n, 
we have E[Z„|^] < E[Xjfc|^] a.s. whenever k > n, and hence E[.Z'„|^] < infjt>nE[Xfe|^] a.s. 
Now by cMCT, 

E[liminf = limE[Z„|g] < lim inf E[Xk\g] = liminf E[X„|g] 

n n n k>n n 

(g) : y lb Xn are non-negative random variables, so by cFATOU, 

E[y|g]+liminf {±E[Xn\g]) = liminf E[y±Xn|g] > E[liminf y±X„|g] = E[y|g]±E[X|g] a.s. 



Information and Expectation 



209 



Since is integrable, it is finite a.s., and hence can be cancelled to yield liminf„(ibE[X„|^] > 

±E[X|^], which implies 

E[X\g] < liminf E[X|a] < limsupE[X„|g] < E[X\g] a.s. 

" n 

(h) : This follows from the usual properties of projections if X,Y G P): For 
suppose that X = X\\ + X±,Y = Y|| + Fl are decompositions of X,Y into components 
parallel and perpendicular to C'^{n,g,F, so that = E[X|g],Y|| = E[Y\g] (by the second 
proof of Thm. 11.1.6). Then 

E[X-E[y|g]] = {X,Y\\) = {X\\,Y\\) =E[E[X\g]-E[Y\g]] a.s. 

because {X±,Y\^) = 0. U X,Y e jC'^{il,J^,F) are non-negative, we may define X„ := X A 
n,Yn --Y An. Then < X„ j X and < y„ j y, and X„,y„ G ^^(j^^ jr,P). It follows by 
the MOT and cMCT that 

E[x-E[y|a]] =iiniE[x„-E[y„|a]] = iimE[E[x„|a] • E[y„|g]] =E[E[x|g] -Efyia]] a.s. 

(i) : If y = 7(5 is an indicator function, and G' G g, then 

E[yE[X|g];G'] =E[E[X|g]|GnG'] =E[X;GnG']=E[YX;G'] 

Hence yE[X|^] is a version of E[yX|^]. The result now follows by linearity and cMCT. 

(j): Consider the case where X G C'^{n,T,F). Since C'^{n,n,F) C C'^{n,g,F) C 
C'^{i^,J-,F) are closed Hilbert subspaccs, the result follows from the fact that a projection of 
a projection is a projection. Alternatively, let 

y := E[X\g] a.s. Z := E[E[X\g]\n] = E[Y\n] a.s. 

liH eHQg, then 

E[Z; H] = E[Y; H] = E[X; H] 

and hence Z is a version of E[X\n], i.e. E[E[X|t/]|W] = E[X\n] a.s. 
The fact that E[E[X|H]|g] = E[X\n] a.s. follows directly (i). 

(k): Let Y := E[X|^]. Since Y is certainly g V W-measurable, we must show that 
E[Y;F] = E[X;F] for all F G ^ V 7^. Now let C := {G D H : G e g, H e H}, and let 
V := {F e gvn : E[Y; F] = E[X; F]}. First note that C C V: For ii G e g,H e H, then 
E[X; GnH]= E[XIg]E[Ih], by independence, and so 

E[X; GnH]= E[X; G]E[Ih] = E[Y; G]E[Ih] = E[Y; GnH] 

since YIq is independent of 7^. It is straightforward to verify that C is a 7r-system that 
generates g W H, and that P is a A-system. Hence by the tt-A Theorem, V = g \J H. 

H 

Definition 11.1.10 Let U be an open subset of M". A function g : 
i7 — M is said to be convex if and only if for any x,y e U and any 
A G [0,1] we have 

g^Xx + (1 - X)y) < Xg{x) + (1 - \)g{y) 



210 



Conditioning on a a-Algebra 



Remarks 11.1.11 The following remarks feature in the proof of Jensen's inequality, the next 
proposition, and should be digested thoroughly. 

(a) Recall that if x,y are points in R", then {Xx + (1 — X)y : < A < 1} is simply the line 
segment in M** joining x to y. The point with coordinates 

{Xx + {1-X)y, g{Xx + {1 - X)y)) 

is simply a point on the graph of g between x and y. On the other hand, the point 

(Ax + (1 - X)y, Xg{x) + (1 - X)g{y)) 

is a point on the line segment joining {y,g{y)) to {x,g{x)). These two points have the 

same x-coordinate, namely Ax + (1 — X)y. We can now interpret convexity geometrically: 
A function g is convex if and only if its graph lies below any chord (line segment) joining 
two points on the graph of g. 

(b) In particular, if (7 : M — > M has g" > 0, then 5 is a convex function. 

The functions x^,x~, \x\ are also convex. 

(c) Let g '■ U — > R, where U is an open subinterval of R. For u,v € U, define A(u,v) = 
^^^u-y^^^ ■ Geometrically, A{u,v) is the slope of the chord joining (u,g{u)) to {v,g{v)) on 
the graph of g. Then g is convex if and only ii u < v < w inU implies A(u, v) < A{v, w). 
This is easy to see geometrically. A more rigorous proof: li u < v < w, define A = ^ 
Then v = Xu+ {1 — X)w. It follows that 



v—w 
w ' 



V — VJ U — V 

9{v) < Xg{u) + (1 - X)g{w) = g{u) H g{w) 

u — w u — w 

Hence 

{u — v)g{v) + {v — w)g{v) > {v — w)g{u) + {u — v)g{w) 
Rearranging yields the result. 

(d) Now \i V <w inU and we let u \ then 

(i) A(it, v) increases s& u] v. 

ill) A(u, v) < A{v, w), and thus A(u, v) is bounded from above as v. 

Since a sequence which is increasing and bounded from above must converge, D~{y) = 
lim A{u, v) exists. Similar reasoning shows that D'^{v) = lim A(v, w) must exist for every 

ulv wlv 

V G U. Thus left- and right derivatives exist at every point v. Moreover, D~{v) < D^{v), 
because each A{u,v) is < each A{v,w). If these limits are equal, then g is differentiable 
at V. 

(e) A convex function is automatically continuous, and thus a Borel function: For let v E U. 
If there is a discontinuity at v, then it is easy to see that either lim A(n, v) or lim A{v, w) 

does not exist. 

□ 



Information and Expectation 



211 



Proposition 11.1.12 (Jensen's inequality) 

Suppose that g : U ^ M. is a convex function on an open interval {/ C R, 
and that X is a random variable with values in U (a.s.) such that both 
X and g{X) have finite expected values. Then 

E[g{X)\g]>g{E[X\g]) 



Proof: We use notation and results from Remarks 6.2.5. Let v E U, and let D (v) = 
limA(n, f) and D~^{v) = lim.A{v,w). Then D~{v),D~^{v) both exist, and D~{y) < D'^{v). 

titf wlv 

Now suppose that m is a real number satisfying D~{v) <m< D^(v), and that x eU. We 
consider two cases: If (i) x < v, then A(x,v) < D^{v) (since A(n, f) increases as u } v) 
and thus A(a;,f) < m. It follows that g{x) > m{x — v) + g{v). Next, if (ii) x > v, then 
A{v,x) > D'^{v) (because A{v,'w) decreases as | v) and thus A{v,x) > m. It follows that 
g{x) > m{x — v) + g{v). Hence, in either case, we have 

g[x) > m{x — v) + g{v) 

for any v £ U, any x £ U, and any D~{v) < m < D^(v). 

We are now ready to prove Jensen's inequality: Put v = E,[X\g]. Then 

g{X) > m{X - E[X\g]) + g{E[X\g]) a.s. whenever D-{E[X\g]) <m< D+{E[X\g]) 

If we now take conditional expectations on both sides , then 

E[g{X)\g]>m{E[X\g]-E[X\g])+E[g{E[X\g])\g]=g{E[X\g]) 

H 

Some notation: we define 

E[x\g\n] := E[E[x \g]\n] 

As always, in the discrete world everything is simple: 



Suppose that X is a random variable on a discrete probability space 
(f2,jr,P), and that g C is generated by the partition Gi, . . . ,Gn- 
Then E(X|^) is constant on each Gj, and 

nX\g){io) = = E[X\G,] for CO G G, 



You should verify the statement in the box. 

Exercise 11.1.13 Toss three fair coins one after the other, a Rl coin, an R2 coin and an R3 
coin. You get to keep the coins which land H. The sample space is O = {HHH, . . . , TTT}, 
and P({w}) = g for all u; G O. Let J^n denote the information after the n''' toss, for n = 
0, 1,2, 3. Let X be your winnings. Calculate the random variables E[X|^„]. 

□ 



212 



Stochastic Processes and Filtrations 



We end this chapter with some simple examples: 

Examples 11.1.14 (a) Suppose that X is a random variable in C'f{^l,J-,¥), where p > 1, 
and let y be a version of E(X|^). Then Y in C^(Q,J^,¥) as well. This is because 
g{x) = \x\P is convex. By Jensen's inequality, we therefore have |IE(X|^)|^' < lEdXl^*!^) 
a.s., i.e. \Y\P < K{\X\P\g). It follows that E\Y\p < K[E{\X\P\g)] = E\X\p < +oo, by (I). 

(b) Suppose that X £ C'^{n,J=',F) (i.e. that var(X) exists). If y is a version of E{X'^\g), 
then Y e C'^{n,J^,F) as weh, by (a), and EY^ < EX^. Since EY = EX (by (I)), we thus 
have 

var(y) = EY^ - (EY)^ < EX^ - (EX)^ = var(X) 

Thus var(y) < var(X). This reflects the fact that Y, being cruder, can't vary as much 
as X can. 

□ 

11.2 Theory of Martingales in Discrete Time 

Martingales are amongst the most important objects in probability theory, and an entire sub- 
discipline of finance is based on them. Brownian motion is the most important continuous- 
parameter martingale, and is heavily used in financial modelling. In this chapter we first 
introduce the basic results about discrete-time martingales at a leisurely rate, taking time 
to build up intuition and facility with martingale calculations. In the next chapter, we will 
tackle continuous-parameter martingales. 

11.2.1 Stochastic Processes and Filtrations 

Informally, a (discrete-parameter) stochastic process X is a family of random variables in- 
dexed by a discrete time set, i.e. X = Xi, X2, X-^, .... The idea is that these model the 
outcomes of a series of random phenomena, such as the closing values of the S&;P500. The 
Xn are thus successive values of some quantity under consideration. Note that the times of 
the random variables may not be evenly distributed in physical time; for example, the share 
index is recorded only on trading days. 

We assume that the stochastic process X = (X„ : n G /) is defined on some probability 
space {Q,J-,¥). The time index set / will usually be the set of natural numbers, or the 
set of non-negative reals, or some finite initial segment these. For a particular outcome 
a; G the sequence Xi{uj),X2{uj), ... is called a sample path of the process. Note that one 
outcome/state-of-the-world uj determines the values of all the X„. We only know the value 
of Xn at time n, and so as time n increases, so does our knowledge of the state of the world. 
Since information is organised in cr-algebras, we associate with each time n a cr-algebra J^n 
modelling the knowledge at time n. We also assume that no information is lost or forgotten, 
so that information available at time n is also available at a later time m > n. This simply 
means that J^m 2 J^n- We thus model the fiow of information as follows: 



Theory of Martingales in Discrete Time 



213 



Definition 11.2.1 Suppose that {Q,J^,F) is a probability space. An 
increasing sequence 

^0 c c • • • c c • • • c jT 

of (T-algebras on Q is called a filtration. We shall always assume that 
To contains all the sets of measure 0. 
We also define 

n 



J^n represents the available information at time n, i.e. it contains all events A for which 
it is possible to decide at time n whether A has occurred or not. 

Suppose that St is the share price at time t. We know 52 at time t = 2. Thus each of the following events 
can be decided at time t = 2: Whether or not X2 = 5.00; whether or not X2 lies between 13.50 and 15.76, 
etc. It therefore follows that X2 must be .F2-measurable, i.e. that a{X2) C J^2- Moreover, Xi is also known 
at t = 2, so a{Xi,X2) C J^2- However, at t = 2 we do not know the share price at time t = 3. Thus X3 is not 
.7-2-measurable, although it is. of course Jf^i-mcasurable. 

In essence, to model the fact that the value of Xm is known at a later time n, we need 
to add the restriction that is .7>i-measurable for all n > m. This just means that 
a{Xi, . .., Xn) C jr„, and so we define: 



Definition 11.2.2 A stochastic process X = (X„,n G /) is said to 

be adapted to a filtration J-n,n G I provided that each Xn is .7>j- 
measurable. It follows trivially that this is the case if and only if 

a{Xi, . . . ,Xn) C 



Exercise 11.2.3 Make sure that you can prove this trivial result. 

□ 

Note that to say that X is adapted to simply means that the random variables X„ do 
not contain more information than the J-n, although they may contain strictly less. 

Note also that J^n = (^{^i, ■ ■ ■ , ^n) is the smallest filtration with respect to which X is 
adapted, i.e. that if X is also adapted to a filtration Qn, then {Fn C Q^. The filtration 
contains just the information in the values of X up to time n, and is called the natural or 
canonical filtration of X. It contains just as much information as is contained in the Xn, and 
no more. 

11.2.2 Martingales, Submartingales, Supermartingales 

Martingales model a fair game, submartingales a favourable game, and supermartingales an 
unfavourable game. Here is the definition: 



214 



Martingales, Submartingales, Supermartingales 



Definition 11.2.4 A stochastic process X = (X„ : n G N) is called 
a supermartingale (respectively submartingale) with respect to a 
filtration J^n, n G N if and only if 

(a) Each X„ G C^{n,J^,F)'' 

(b) X is adapted to JT^, n G N. 

(c) E[X„+i|.?>i] < Xn (respectively, E[X„_|_i|jr„] > X„) for each n G N 

A martingale is simultaneously a sub- and a supermartingale, i.e. it 
satisfies E[X„_|_i|.?>i] = X„ for each n G N. 

When we say that X is a (supcr/sub-)martingale, but we don't mention 
a specific filtration, then the natural filtration should be used. 

"i.e. each X„ is integrable, which just means that EX„ exists, and is finite. 



(Note that we've taken N as the index set. You shouldn't have any trouble generalizing 
the definition to the case where the index set is some initial segment {0, 1, 2, . . . , T} of N). 
Think of X^ as your total fortune after the n^^ round of a gambling game. If X is a 

supermartingale, your expected fortune at time n + 1 is less than your fortune at time n. It 
follows that this particular game is unfavourable, i.e. that you arc likely to lose. If X is a 
martingale, then your expected fortune equals your present fortune: You are just as likely to 
win as to lose, and the game is fair. 

Examples 11.2.5 (a) Suppose that the X„,n G N are independent random variables with 
EX„ = 0, and that J-'n,n G N is the natural filtration. Define Sn = Xi + • • • +X„. Clearly 
>S' = {Sn : n G N) is a stochastic process adapted to .F^, n G /, and each Sn is integrable. 
Moreover, 

E[5„+i|J^„] = E[Xi|J^„] + • • • + E[X„|J^„] + E[X„+i|J^„] 

Since X^ is jr„-measurable for m < n, it follows that E[Xm|.7>i] = X^ if m < n. 
Moreover, since the X^ are independent random variables, Xn+i is independent of J-'n, 
and thus we have E[X„+i|jr^] = EX„+i = 0. Hence 

EiSn+llJ'n] = Xi + --- + Xn + = Sn 

which proves that S'n, n G N is a martingale. 

(b) If we have the same situation as in (a), but with EXn, > for all n, then S^, n G N is a 
submartingale. 

(c) If Xn, n G N are random variables with the same mean n = and the same variance cr^, 
and if Sn = Xi+X2+- • •+Xn, then the process Wn = S^—na'^ is a martingale with respect 
to the natural filtration of the X„. First note that each W„ is integrable if and only if 5^ 
is, but this follows because the variances = EX^ exist, so that each X„ G £^($7, JF, P). 
To verify the martingale property, observe that S'^^^ = 5^ + 2SnXn+i + X^_^]^. Further 
observe that E[5^X„_|_i|Fh] = 5'„E[X„_)_i|.F„], because Sn is Fn-measurable, and that 



Theory of Martingales in Discrete Time 



215 



E[X„+i|^„] = EXji^i = 0, since Xn+i is independent of J^n- Thus: 

E[Wn+l - WnlJ'n] = + X^+l)^ - - a'')\J'n] 

= 2E[SnXn+l\J'n] + EfX^^ilJ^n] - (7^ 
= 2SnE[Xn+l\J'n] + ^X^+l - CT^ 

= 2S„-EX„+i+Var(X„)-(7' 
= 

(d) Suppose that X„ are non-negative random variables with EX„ = 1. Put Mq = 1, and 
define 

Mn = Xi-X2 Xn 

for n > 0. Assume that each M„ is integrable. It is left as an exercise to show that M„ 
is a martingale. 

(e) Consider a random walk. If it is symmetric, it is a martingale. If the probability p of 
going up is < 0.5, it is a supermartingale. 

(f) One more interesting martingale demonstrates the accumulation of information about 
the value of a random variable over time. Let Y be an integrable random variable (i.e. 
.F-measurable) . We do not necessarily know the value of Y at time n — there may not be 
enough information available. However, as time passes, we expect that our estimate will 
become more accurate. At time n, the best available approximation to F is = E[F|jr„]. 
We now show that is a martingale (with respect to the natural filtration). Firstly, 

Ey„ = E[E(Y\J^n)] = Ey 
by the "Tower Property". This shows that each y„ is integrable if Y is. Next, 

E[yn+i|^n] = E[E[y|j^„+i]|.?^„] = E[y|j^„] = y„ 

by the Tower Property again. This proves the result. 

What this means is that there are no trends in our estimates of y. At each new time 
step, our revised estimate is just as likely to go up as it is to go down, and is expected 

to remain at the same value as our previous estimate. This makes sense: If we expected 
our estimates to increase, for example, then our estimates would not have been the best 
available. We ought to have built the expectation of increase into our estimates already. 

(g) Note that if Xn is a martingale, and if is a convex function, then (p{Xn) is a submartin- 
gale. Indeed, 

E[^(X„+i)|J^„] > ifiElXn+llJ'n]) = ^{Xn) 

by Jensen's inequality. It follows that if Xn is a martingale and if p > 1, then \Xn\^ is a 
submartingale. 

□ 

Remarks 11.2.6 (a) If Xn,n G N is a martingale, then EX„ = EXq for all n, i.e. all the 
Xn have the same mean. This is an easy exercise. 



216 



Gaines and Strategies 



(b) We have defined the martingale property with respect to a filtration. Thus if Xn is a 
martingale with respect to one filtration, it may not be with respect to another. However, 
if Xn is a martingale with respect to some filtration Qn, and if J^n = <^{Xi, . . . , X^) is the 
natural filtration, then X^ is also a martingale with respect to J^n- To see this, first note 
that each X„ is ^„-measurablc (because X„ is adapted to On — part of the definition of 
martingale). Thus C for each n. It now follows by the Tower Property that 

ElXn+llJ'n] = E[E[Xn+l\gn]\:Fn] = E[X„|J^„] = X„ 

The last equality holds by "Taking out what is known", because Xn is ^n-measurable. 
It is now not hard to see that if X„ is a martingale with respect to one filtration, it will 
also be a martingale with respect to any poorer (in information) filtration to which it is 
adapted. 

(c) The converse of (b) is not true: If Xn is a martingale with respect to the natural filtration, 
it may not be a martingale with respect to a richer (in information) filtration. Find a 
simple example! 

(d) Note that if X„ is a martingale, and if m > n, then E[Xm|^n] = ^n- This is left as 
another exercise in the use of the Tower Property. 

□ 

The following exercise will prove extremely useful: 

Exercise 11.2.7 (Orthogonality of Martingale Increments) 
Prove that if M„ is a martingale, then 

E[{Mn-M^f\J^k]=nM^-M^\J^k] k<m<n 

Deduce that 

n 

E[M„]2 = EMi + J2 HiMm - M^_i)2] 

m=l 

□ 

11.2.3 Games and Strategies 

Suppose that you take part in a game of chance, e.g. a game of coin tossing, roulette, or 
investing in the stock market. The game is repeated many times, and you place a bet each 
time. Let ^n,n G N be a sequence of integrable random variables which represent your 
winnings (or losses, if negative) per unit stake in the n*^ game. Thus, if you had wagered a 
stake Cn on the ra*^ game, you would have won Cn^n- 

If you played unit stakes all the way through, your total winnings after the n^^ game 
would be 

Sn = ii + --- + in forn > 1 

Note that Sq = 0, because you haven't won or lost anything yet. 

If the game is fair then your chance of winning is the same as your chance of losing, 
and thus = 0. In that case, Sn is a martingale with respect to the natural filtration 



Theory of Martingales in Discrete Time 



217 



= O'i^i, ■ ■ ■ ,(,n) = (^{Si, . . . , Sfi). Similarly, if the game is unfavourable to you, then at 
time n you expect your winnings at time n + 1 to be less than your current winnings, i.e. 
E[5'„+i|^n] < Sn- Thus an unfavourable game is modelled by a supermartingale. A favourable 
game will clearly be modelled by a submartingale. 

Suppose now that you have a system, i.e. a gambling strategy, which tells you when to 
bet, how much to bet etc. Your system, call it C, will tell you what stake C„ you should 
place on the n*** game. We allow negative stakes as well (which are essentially bets that you 
will lose)^. In that case, your total winnings after the n*^ game will be 

Wn = Cl6 + • • • + Cn^n 

Now note that = Sn — Sn-i = AS'n, and thus that 

n n 
Wn = ^ Ck{Sk — Sk-l) = ^ CkASk 
k=l k=l 

which looks like a Riemann-Stieltjes sum^. Your strategy C = {Cn : n G /) is also a stochastic 
process, but since we have to decide what stake to wager before the outcome of the n}^ game 
is known, we must be able to decide the value of (7„ on the basis of information available at 
time n — 1 (i.e. after the (n — 1)*^ game). Thus each Cn is jr„_i-measurable. We have a 
name for this: 

Definition 11.2.8 A stochastic process C is called previsible (or non- 
anticipative, or predictable), with respect to a filtration jr„ provided that 
each Cn is jr„_i-measurable, for n > 1. Note that Co is not defined. 

Thus a gambling strategy is just a previsible process. 

Consider an arbitrary adapted stochastic process Yn- Then in general Yn may exhibit both 
purely random behaviour and long-term trends. For example, for supcrmartingalcs the long- 
term trend is that it tends to decrease. Purely random behaviour is described by martingales, 
and trends are known beforehand, i.e. are previsible. We thus attempt to decompose Yt into 
a martingale part and a previsible part, i.e. we try to write 

Yn = Mn + An 

where M„ is a martingale, with Mq = Yq, and An is previsible, with Aq = 0. In engineering. 
An is called the signal, and M„ the noise. 

Suppose that we can actually find such a decomposition. We would then have 

Yn+l -Yn = {Mn+1 " M„) + {A^+l - A^) 

Taking conditional expectations immediately yields 

An+l-An=E[Yn+l\J'n]-Yn 

SO that 

Mn+l -Mn = Yn+l - ElYn+llJ'n] 

We now use this pair of equations to define Mn and An in the next theorem. 

^We need negative stakes to model short sales, which are essentially just bets that a stock will lose value 
^The Riemann-Stieltjes integral is discussed in Chapter ?? 



218 



Gaines and Strategies 



Theorem 11.2.9 (Doob Decomposition Theorem) 
Every process has a unique decomposition 

Yn = Mn + An 

where M„ is a martingale with Mq = Yq, and An is previsible, null at 
n = 0. Moreover, iJYnisa supermartingale, then A^ is decreasing. 

Proof: Dine M„, An inductively by 

f Mo = Yo, Mn+l = Mn + Yn+l - E[y„+i | j^^] 
\ Ao = 0, An+l=An + '^[Yn+l\Tn\-Yn 

It is clear that M„ is a martingale and that An is previsible. Moreover 

Ym - Ym-l = {Mm - M^-l) + {A^ - A^-l) 

summing over m from m = 1 to m = n yields 

Yn = Mn + An 

as required. 

To see that this decomposition is unique, suppose that Yn = M'n + A'n is another decom- 
position with the same properties. We show by induction on n that M = M', A = A': Note 
that Mo = Mq by definition. Suppose that M„ = M^, and, consequently, that An = A'n- 
Then 

Mn+l - M'n+i = An+l - 

Taking conditional expectations with respect to we obtain 

= M„ - M; = - An+l 

because A, A' are previsible. Hence An+i = and so M„_|_i = M^_^-^ as well. By induc- 

tion, we have M„ = M^, An = A'n for all n eN. This proves that the Doob Decomposition is 
unique. 

If Xn is a supermartingale, then E[X„_|_i|jr„] < X„, so the definition of An+i implies that 

^n+l ^ An- 

H 

Exercise 11.2.10 Suppose that It is a martingale. By Jensen's inequality, Y^ will be a 
submartingale, and thus have an increasing trend. The previsible trend part At of Y^ is 
called the quadratic variation, for the following reason: 

AAt = E[y,2 _ Yl,\J^t-i] = niYy - lt-i)Vt-i] = niAYtf\J^t-i] 
t 

so that At= Yl mAYs)'^\J^s-i]- Prove this. 

s=l 

□ 



Theory of Martingales in Discrete Time 



219 



In the continuous-time theory, the generahzation of the Doob decomposition to the Doob- 
Meyer Decomposition Theorem for submartingales is a deep result. The quadratic variation 
process associated with a submartingale is of great importance in deriving a general theory 
of stochastic integration. 

Definition 11.2.11 If C is a previsible process, and if X is adapted 
(both with respect to a filtration J^n,n € N), then the msirtingale 
transform of X by C is the process W given by 

Wo = 

n 

Wn = Y,Ck{Xk-Xk-i) ifn>0 

k=i 

The process W is generally denoted hy C ■ X, and Wn by (C • 

Thus the martingale transform of by C is simply your winnings process on the game 
X using the gambling strategy C. Now comes the crunch: 



Theorem 11.2.12 (a) Suppose that X is a martingale, and that C is 
a bounded previsible process. Then C ■ X is a martingale. 

(b) If X is a supermartingale (submartingale) , and C is a bounded non- 
negative previsible process, then C ■ X is a supermartingale (sub- 
martingale). 

Proof: (a) Let W = C ■ X . The fact that C is bounded and that each Xn is integrable 
implies that Wn is integrable as well. Then IVn+i — Wn = Cn+i{Xn+i — Xn). Using the fact 
that Xn and C„+i are .?>i-measurable, we see that 

E[W„+i - WnlJ'n] = Cn+lMXn+llJ'n] - ^n] = 
SO that E[Wn+l|J^n] = E[W„|JPn] = W"„. 

The proof of (b) is left as an exercise. 

H 

This theorem has the following important consequence for games of chance: You cannot 
find a previsible trading strategy which will turn a fair game to your advantage, i.e. which 
will turn a martingale into a submartingale. No matter what your strategy, your winnings 
process will still be a martingale. 

As a final remark, note that if X is a (super-, sub-) martingale, and a is a constant, then 
y = X + a is a (super-, sub-)martingale, and moreover C ■ X = C Y. 



220 Games and Strategies 



Chapter 12 



PDEs in Finance, with a Detour 
Through Black— Scholes 



In this chapter wc derive the Black-Scholes equations for European options. The Black- 
Scholes price is not model independent, i.e. it depends on the model we chose for stock 
prices. Accordingly, the first section of this chapter is concerned with developing a model of 
stock price behaviour. In the second section, wc develop the machinery of stochastic calculus 
in an intuitive and non-rigorous manner. It should be pointed out that the motivation for Ito's 
formula provided in this section is pretty flimsy, and we do not claim to give a mathematically 
accurate account. 

Having Ito's formula at our disposal, we then derive the Black -Scholes PDE, again in an 
intuitive non-rigorous manner. Instead of solving the PDE directly — we will do that later 
— we note that the PDE has a surprising property: The drift of the underlying asset does 
not occur in the PDE. This allows us to use the machinery of risk-neutral valuation to derive 
the Black-Scholes option prices. 

12.1 Modelling Stock Prices 

Any model of stock price behaviour must be stochastic, i.e. it must incorporate the random 
nature of price behaviour. The simplest such models are random walks: Let Xt, t = 1,2,... 

be a family of distributed random variables, and let 5*0 be the stock price at f = 0. We might 
(naively) attempt to model the stock price process by 

t 

St = St-i + Xt i.e. St = So + Y.Xu 

u=l 

The intuition behind this is that the price at time t equals the price at time t — 1 plus a 
"random shock", modelled by Xf. 

We should also assume that these shocks are independent. Why? If we could predict 
today that the stock price is going to go up tomorrow, this makes the stock more attractive 
today. Thus more people would buy it today, forcing the stock price up today, until it reaches 



221 



222 



Modelling Stock Prices 



the level predicted. Thus any change in the stock price must essentially be unpredictable. 
This is just a version of Efficient Markets Hypothesis, which, loosely, asserts that all available 
information about a corporation is instantly reflected in its stock price. Thus future changes 
in price are not dependent on past changes in price. 

There arc several reasons why a random walk model of stock prices is inadequate, but an 
obvious one is that it doesn't take into account scale, for stock prices, we expect the change 
in price to be proportional to the current price. To see this, consider two companies in two 
parallel universes, A and B. The universes and the companies are identical, except for one 
thing. In universe A, the company has issued 100 shares, each trading at $100. In universe 
B, the company has imdertaken a 2-for-l stock split, so that it has issued 200 shares, each 
trading at $50. Both companies are otherwise identical, e.g. they arc both worth $10 000. 
One day an earthquake cause massive damage, and both companies lose half their value. The 
shares in universe A now trade at $50, whereas those in universe B trade at $25. Thus the 
share price has not dropped by the same amount in both universes: Each share has lost the 
same proportion of its value. 

Simply put, if investors require a return of 14%, then they require that return irrespective 
of whether the share price is $50 or $100. 

The shares of A, B change by the same factor, i.e. they have exactly the same change in 
returns (but not the same absolute change in price). This is reflected in, e.g., the binomial 
model, where shares can go up by a factor of u or down by a factor of ^. But a multiplicative 
change in the stock price amounts to an additive change in the logarithm of the stock price: 

St+At = u^^St imphes In St+At = In ± In ti 

i.e. if we deflne the returns process Rt by St = Sqc^* (i.e. Rt := In ^), and deflne 5 := In-u, 
we have i.e. 

Rt+At = Rt 

A better random model of stock prices is therefore one in which the returns process Rt 
follows a random walk. 



12.1.1 Modelling Returns in Continuous— Time 

We now seek a continuous-time version of the random walk — a stochastic process that is 
changing because of random shocks at every instant in time. Consider a time interval [0,T] 
and let N he & (large) integer. Deflne At := ^. Let Xn,n = 1,2,3,... be independent 
Bernoulli random variables with 



F{Xn = Ax)=p and P(X„ = -Ax) = l-p=:q 

where Ax > 0. For t = 0, At, 2At, NAt = T, let Rt := ^"=1 ^n, where t = nAt. Thus 
Rt is a random walk, and 

Rt+At = Rt±Ax 

Some simple calculations yield 

E[Rt] = n{p - q)Ax = {p - q) —t Var(i?t) = n(Ax^ - (p - qfAx^) = 4pq-^t 



PDEs in Finance 



223 



Now suppose we can observe the process Rt and want = and Var(i?() = cr^t, where 

jjL, a are constants, and a > 0. (We want > 0, otherwise Var(i?t) = 0, in which case Rt 
would be non-random.) 

In the continuous hmit, i.e. as iV — oo and At — 0, we must have 

The first equation yield Ax « when At is small. Substituting into the second equation, 
we see that 

-^P^At^- 

ip - /^^ 

when At is small. Now since. At — 0, we must have j^^p — oo, for otherwise the product 
j^^At would tend to 0, not It is therefore necessary that p — q^Q, and thus p, q must 
both tend to \ as At — 0. Prom the fact that ^pq^^ — >■ cr^, we then see that we must have 

Ax a^/~At 

for small At. 

We had Ax « for small At, and thus p — q ^ ^\/At. Since p + q = 1, we must have 



p=i(l + ^VAt) =l(i-MVAt) 



As a check, note that 



and 



^r^^ , ^Ax u r—ay At 
E[i?i] = {p-q)—t= t^^/Ai^^t = nt 



Var(i?t) = Apq——t = (1 - ^ At) — — t = aH - nHAt ah 



At ^ ' 
as should be the case. 

We now have an idea of how to create a a continuous-time stochastic process Rt as the 
(At — > 0)-limit of a random walk. But the limit process has some peculiar features. For 
example 

ARt ~ ib(7\/At is of the order of \/At 
If /(t) is a differentiable function, then 

A/(t) ^ /'(t)At is of the order of At 

. Now when At is small, we see that \/At is much larger than At (Take, e.g. At = 10~^" and 
note that \/At = 10~" = 10" At.) It follows that Rt cannot be differentiable as a function of 
t. 

The probabilist will immediately want to know the distribution of Rt- Let ii(t, x) be the 
density of the random variable i?^, i.e. 

ti(t, x) Ax K, P(i?t G [x, X + Ax]) 



224 



Modelling Stock Prices 



At time t + At the random walk can reach the point x in two ways: It can move right from 
the point x — Ax at time t, with probabihty p, or it can move left from the point x + Aa:;, 
with probability q. Thus 

u{t + At, x) = pu{t, X — Ax) + qu{t, x + 6x) 

Now we Taylor expand up to order At. Firstly 

u{t + At, x) « u{t, x) + ut{t, x)At + o{At) 

Next, 

u{t, x lb Ax) = u{t, x) ± Ux{t, x)Ax + ^Uxx{t, x)Ax'^ + o{Ax'^) 

Here, we have taken a second-order Taylor expansion, because Ax is of the order \/At, and 
Ax^ of the order At. Putting these together, we obtain (at the point (t, x)): 

u + utAt= {p + q)u + {-p + q)uxAx + ^{p + q)uxxAx'^ 

However, we know that p= ^(1 + ^Vi) and that Ax ^ aVAt and p,q —>■ |. Hence 

utAt = -{^VAi)ux{aVAi) + ^u^^a^At 

which yields the following partial differential equation for the density of Rt- 

However, the PDE is not sufficient to determine the density u: It has many solutions. We 
seek a solution which has the following properties: 

• For each t > 0, we have u{t, x) dx = \, because u{t, x) is a density, and 

• ^(Ojx) is rather odd: We have i?o = 0, and so 

/oo 
/(x)n(0,x) dx 
-oo 

i.e. u(0, x) is a "function" with the property that /(x) dx = /(O) for every function 

/. The "function" with this property is called the Dirac delta 5q. It is not a function 
at all (but the simplest example of a so-called generalized function or distribution (in 
the sense of Schwartz).) Nevertheless, we can get some intuition as to how u ought to 
behave. We see that for t close to 0, the density u{t,x) must be very small for x 7^ 0, 
because Rt must be close to x when t is near zero. Yet the area under the curve is 1, 
i.e. u{t, x) must be extremely peaked at around .x = and then rapidly drop off. We 
may thus think off w(0, x) = 6q as a, "function" which has 

/oo 
5q(x) dx = 1 
-00 

Oddly enough, we can find such a function. The PDE for the density, derived by Einstein 
in 1905, is a version of the heat equation, derived by Fourier, which governs heat transfer. 
So this PDE was not new: It had been intensively studied by physicists, with u{t, x) playing 



PDEs in Finance 



225 



the role of the temperature at time f at a point x in an infinitely long rod. The fundamental 
solution or Green's function of such a PDE was well-known 

1 _(x-^lt)'^ 

U(t,x) = e 2a'^t 

V2^t 

We will give a derivation of this result later on, but you can verify by direct differentiation 
that this function does, in fact, satisfy the PDE. You will also immediately recognize it as the 
density of an N{iJ.t, (7^t)-random variable. Furthermore, for t near 0, such a random variable 
has very small standard deviation, and thus the density is extremely peaked around 0, just 
as we require. 

It follows, therefore, that the density of t is N{iJ,t,a'^t). Of course, the Central Limit 
Theorem states that, subject to a moment condition, large sums of i.i.d. are roughly normally 
distributed, so we are not surprised. But here, we have in essence given a proof of the Central 
Limit Theorem by PDE methods, at least for random walks of the type described. 

When we take = and a = 1, we obtain one of the basic building blocks of financial 
modelling: 

Definition 12.1.1 Standard Brownian motion is a continuous-time stochastic process Bt, t > 
with the following properties: 

(1) Each change 

Bt — Bs = {Bg+h — Bg) + (-Bs+2/i — Bs+h) 
+ ... + {Bt- Bt-h) 
is normally distributed with mean and variance t — s. 

(2) Each change Bt — Eg is independent of all the previous values B^, u< s. 

(3) Each sample path Bt^t > is (a.s.) continuous, and has Bq = 0. 
Now put 

Rt = ijLt + aBt 

It then follows easily that 

Rt ~ N{iJ,t, ah) 

i.e. the standard Brownian motion can also be used to to model returns processes where 
fJ, ^ and (7 7^ 1. The process Rt is called an arithmetic Brownian motion with drift rate fj, 
and variance rate a^. We will also refer to a as the volatility. 

12.1.2 Modelling Share Prices in Continuous Time 

We have obtained a model for share prices: 

St = e^* = 6'^*+^'^* 

We shall soon see that this translates to a stochastic differential equation 

dSt = aSt dt + aSt dBt a := ii+ ^o"^ 

i.e. the proportional change in share price ^ can be decomposed into two terms, a dt 
and a dBt- Such a process is called a geometric Brownian motion (GBM) with drift a and 



226 



A Naive Approach to Stochastic Calculus 



volatility a., The drift is the (proportional) rate at which the share price increases in the 
absence of risk. The differential dBt models the randomness (risk) , and the volatility models 
how sensitive the share price is to these random events. The greater a, the faster the share 
price increases in the absence of risk. The greater a, the more violently the share price reacts 
to random events. Note that dBt can be negative (unlike dt), allowing for decreases in share 
price. Also note that \dBt\ ~ Vdi » dt, so that over short periods the change in share price 
is dominated by random events. Many of these random events cancel out however, so that in 
the long run the drift term is dominant. 

Now consider a market with a share St whose price follows a GBM dSt = aS dt + aS dBf. 
Let the risk-free interest rate be r, i.e. the risk-free bank account At satisfies the DE 

dAt = rAt dt 

At is the riskless asset. It has drift r and zero volatility. 

A portfolio is a two-dimensional process {6t, 9t), where is the number of shares owned 
at time t, 9t is the amount of money in the bank account at time t, discounted to time 0. 
Given such a dynamic portfolio 9t = {6t,0l), the value process Vt{9) satisfies 

dVt = 9^ dAt + el dSt 

= {r9°At + f^9lSt) dt + 9^aSt dBt 

The value of the portfolio at time T is therefore 

Vrie) = Voi9) + r[re^At + ,,9lSt] dt 
Jo 

+ [ 9^aStdBt 
Jo 

We now see that we need to be able to evaluate integrals of the form 

r fit) dBt 

Jo 

This is an example of a stochastic integral. The obvious method would be to regard the above 
as a Riemann-Stieltjes (or Lebesgue-Stieltjes) integral. However, it can be shown that this 
approach will not work. Nevertheless, it is possible to define the stochastic integrals, and 
there is even a very simple rule which allows us to manipulate them: Ito's formula. However, 
the rules of stochastic calculus do differ from those of ordinary calculus. We arc, after all, now 
working with stochastic processes whose paths are nowhere dift'erentiable, whereas ordinary 
calculus deals only with differentiable functions. 

12.2 A Naive Approach to Stochastic Calculus 

Let f{x) be a differentiable function on an interval [a,b]. Partition this interval: 

a = Xq < Xi < X2 < Xn = b 

where Xj+i — Xi = Ax. Then by Taylor series expansion, we get 
f{xi+i) - f{xi) = f'{xi)Ax + ^f"{xi){Axf 
+ hf"'i^i){^^f + terms involving Ax^, Ax^, . . . 



The PDEs in Finance 



227 



Thus 

n-l 



/(6)-/(a) = ^[/(x,+i)-/(xi)] 

1=0 

n— 1 n— 1 

= ^/'(x,)Ax+2^r(x,)(Ax)2 + ... 

i=0 i=0 

As Ax — 0, we get 

f{h) - f{a) = lim V/'(x,)Ax + \ lim ^/"(^^(Ax)^ 

Ax— >0 ' Z Ax— >0 ' 



/ 



+ ... 

h 

f'{x) dx + 



I I fix) {dxf + ... 



In ordinary calculus, only the first term counts (by the Fundamental Theorem of Calculus), 
and the other terms are zero. This is because the quadratic variation of any "ordinary" 
function is zero, i.e. 

lim V(A5)2 = 

for any "ordinary" function g. This is not all that hard to see: We have 



lim V(Ao)2= lim "yg'ixfAx'^ 

= ( lim Ax)( lim g' {x)'^ /S.x) 
Ax— ♦O Ax-^0 ' 

= 0- / g'{xfdx 

J a 

= 

(assuming that g is continuously differentiablc). 

But Brownian motion is different: Consider Ai? = Bt+^t — Bt- This is a normally 
distributed random variable with E[Ai?] = and variance var(AB) = Af. 

Consider next the random variable (AS)^. This has 

E[(A5)2] = var[A5] = At 
var[(AB)2] = E[(A5)^] - {Atf = 2{Atf « At 

Thus the variance of (AB)^ is ^ 0, i.e. though AB is a random variable, (AB)^ is a constant.-*^ 
It follows that 

lim Ve(AS)2= lim At = T 

where T is the total elapsed time. Thus the quadratic variation of Brownian motion is non- 
zero. 
Also 

lim y E(Am^ = 2 lim J^iAtf = 



^This nonsense can be made precise, promise. 



228 



A Naive Approach to Stochastic Calculus 



because g{t) = t is an "ordinary" function, with quadratic variation zero. Hence we cannot 
ignore the second-order term 

1 rb 

I I I I I u,:i: ^ 



in the case that x = B, but we can ignore all higher-order terms. 
We thus have the following rules for stochastic calculus: 

{dBtf = dt 
dBt ■ dt = {dtf = 

Suppose that f{t,x) is a C^'^-function, and let Xt = f{t,Bt). Applying these rules to a 
second order Taylor series, we obtain: 



Theorem: (Ito's Formula) 



dXt=i^^ + -^)dt+%dB, 
dt 2 dB^ J dB 



Ordinary calculus shows that for a function f{t,x) we have 

In stochastic calculus, we get another term, due to the non-zero quadratic variation of Brow- 
nian motion. 

Example 12.2.1 Take our model for stock prices in terms of the returns process: 

St = e^* Rt = fJ,t + aBt 

Using Ito's formula, we get 

dSt = Odt + e^* dRt + \e^' {dRtf = St[dRt + ^{dRtf] 

Now 

dRt = ndt + a dBt {dRtf = dt 

so 

dSt = St[^Ji dt + u dBt + \(t'^ dt = {ii+ \a'^) dt + a dBt 

as claimed earlier. 



□ 



Now let's have another look at volatility. The GBM model for stock prices is 

dSt = aSt dt + aSt dBt 

Thus 

dS^^ 



PDEs in Finance 



229 



and thus cr^ dt is the variance of the return of the stock over a small period dt. 

It follows that a is the standard deviation of the annual return of the stock S. This can be 

measured from market data. 

Can we also measure the drift a? This is much harder^, because over short periods, 
the dSf-term dominates the dt-term. The "correct" real-world dynamics of a share price is 
difficult to estimate: We can get the volatility, but not the drift. Amazingly, we don't care, 
as you will see shortly. To calculate option values we need only the volatility, not the drift. 



12.3 The Black-Scholes Model 
12.3.1 The Black-Scholes PDE 

Using Ito's formula, it is not hard to derive a partial differential equation for European style 
derivatives. 

Consider again market with a share St whose price process satisfies the SDE 

dS = fiS dt + aS dBt 

Let the risk-free interest rate be r, and let At be the riskless bank account, with dynamics 

dAt = rAt dt 

Let V{t, St) be European-style derivative whose value depends on both the share price and 
time. Consider a portfolio 11 which contains 1 derivative, and n shares, i.e. its value is 

n* = Ft + nSt 

A small amount of time dt later, the share price has changed. The value of the portfolio 
changes by 

cfflt = dVt + ndSt 



By Ito's Formula, 



dV dV Id'^V n 



1 



-dt^^^dS^2'' ^ dS^ 



(t'^S''^^ 1 dt + aS^dBt 



95 



Hence 



fdV \ 
+ a5 - + n dBt 



Thus 



fdV 



dV 



+ aS 



'dV 

as 



+ n 



dS 

dBt 



+ n 



1 2^2d'V\ 



■^Martin Davis once claimed, in a talk that I attended, that one would need 1500 years of data to get a 
reasonably accurate estimate of the drift — I'm no statistician, though. 



230 



The Black-Scholes Model 



Now if we take n = (i.e. the portfolio is short — §^ shares), then the portfoHo is 

unaffected by the random changes in stock prices: 

Thus, for a brief moment, the portfoho is risk-free. By a no-arbitrage argument, it must earn 
the same return as the risk-free bank account, i.e. 

dUt = rUt dt = r(v - j^S^ dt (12.2) 

Equating (12.1) and (12.2), we get 



This is the famous Black— Scholes PDE. It is a second-order parabohc PDE, i.e. essentially 
a heat equation. Most of the PDE's encountered in finance are of a similar type. 

Note that if a portfolio contains ^ shares, then the change in the portfolio value is the 
same as the change in the value of the derivative. The quantity ^ is called the delta of the 
derivative. One can thus synthetically replicate any European style derivative with underlying 
share S by holding, at any time, delta-many shares. This procedure is called delta hedging. 

Consider a European call option C on a share S with strike K and maturity T. The 
volatility of the underlying share is o" and the risk-free rate is r. To find the value of the 
call option, we must solve the following boundary value problem: 

_ C{T) =max{5T--fsr,0} 

It is now clear why we don't care about the drift ji of the underlying asset S: It does not 
appear in the Black-Scholes PDE, and is therefore irrelevant to pricing derivatives. 

12.3.2 Pricing in the Risk-Neutral World 

In this section we calculate the Black-Scholes prices of vanilla European options. However, 
we use a slightly subtle probabilistic argument, rather than a brute force "solve the PDE" 
approach. 

In the previous section, we deduced the Black-Scholes PDE for a European-style deriva- 
tive V: 

dV I ^^^d^v dV 

Note once again that the drift fi does not occur in the Black-Scholes PDE, though the volatility 
a does appear. Hence the price of V is independent of fj,, i.e. different values of jj, will give 
the same price. 

Thus, for example, the price of a call option on a share S with a given strike K and matu- 
rity T will be the same whether fi is very small or very big. This may seem counterintuitive, 



This is the crux of the argument! 



PDEs in Finance 



231 



because if /i is very big, it seems as though the option is more hkely to expire in-the-money. 
One would therefore think that the call option price should be an increasing function of ji. 
But your intuition is just plain wrong. 

Since we don't care about the drift rate ji of an underlying asset, we may as well simplify 
our asset price dynamics by assuming that all assets have the same drift. Now the riskless 
asset (bank account) has drift r, and r occurs in the Black-Scholes PDE. We can't change 
the drift of the riskfree bank account without changing the PDE, and thus the solution to 
the pricing problem. So if we want to assume that all assets have the same drift, we have to 
assume that the drift of all assets is the risk- free rate r. 

Mathematically, this corresponds to a change of measure — from a real world, unknowable 
probability measure P to a knowable, risk-neutral measure Q. In the risk-neutral world, the 
dynamics of S are 

dSt = rSt dt + aS dBt 

Thus we change the drift of the asset from ^ to r. 

Why is the world where all assets have the same return called the risk-neutral world? 
Ordinarily, investors are influenced by risk: They weigh up the expected return against the 
riskiness of an investment. Generally, investors are risk averse, which means that they require 
a premium in order to take on risk. Thus assets with greater riskiness (= volatility) have a 
higher ("real world") expected payoff than assets with less risk. In the risk-neutral world, 
investors are indifferent to risk, i.e. they do not require a risk premium. The only thing they 
care about is expected return. Such investors will buy assets with a higher expected return, 
and sell assets with a lower expected return, regardless of the risk involved. Prices will thus 
adjust so that all assets have the same expected return (in equilibrium). Thus in a world 
where all investors are risk-neutral, all assets will have the same expected return, i.e. the 
same expected return as the risk-free bank account. 

To summarize, prices in the real- and risk-neutral world are the same. It is just probabil- 
ities that arc changed. Now we can calculate option prices in the risk-neutral world, because 
the asset price dynamics are known, and so is the distribution of future stock prices. 

Now suppose that we can find a portfolio 11 of traded assets which exactly hedges the 
payoff of a European style derivative V, so that 

Ht = Vt 

at the derivative's maturity T. Such a portfolio is called a replicating portfolio. By the Law 
of One Price, therefore, we must have IIq = Vq, where Hq, Vq are, respectively the values of 
the replicating portfolio and the derivative at f = 0. Thus: 

// a derivative has a replicating portfolio, then 
the value of the derivative equals the value of 
the replicating portfolio. 

Now in the Black-Scholes model, any European style derivative has a replicating portfolio: 
A portfolio consisting, at any time, of A = ^ shares will exactly replicate the derivative V 
(delta hedging). 

IIt and Vt are random variables. But since they are identical, they must have the same 
expectation, in any world. Since the expected return of all traded assets is r in the risk- 
neutral world, and since 11 consists entirely of traded assets, the expected return of 11 is also 



232 



The Black-Scholes Model 



r: 

where Ho is the value of the portfoho at i = 0. Now since XIq = Vq (by the Law of One Price) 
and Ht = Vr (because 11 is a repHcating portfolio of V) , we see that 

Vb = e-'-^ERN[VT] 

We have therefore discovered the following procedure for valuing a derivative V. 

(i) Assume that the drift of the underlying asset 5 is r (instead of ji). This moves us from 
the real world to the risk-neutral world. 

(ii) Calculate the expected pay-off of V at maturity in the risk-neutral world: ErnIVt] 

(iii) Discount to the present to get the price today: 

Vo = e-'^^ERNiVr] 

The point is that we can't calculate Ereai[^]) because we do not know the distribution 
of the underlying St in the real world. However, we can calculate ERAr[Vr]: Since we know 
the drift of St in the risk-neutral world, we can calculate the distribution of St here. This 
brings us to our next topic. 

12.3.3 The Distribution of Asset Prices 

We have postulated an asset price model 

= jidt + a dBt 

St 

where = r in the risk-neutral world. Consider now the function Yt = f{St) = InS^. By 
Ito's formula, 

1 .„ 11 



dYt = j-^dSt-~{dStf 
1 



,2^ 



= ifi- -a^) dt + a dBt 

using (dBt)'^ = dt, {dtf = = {dBt){dt). 

So It = In St follows a Brownian motion with drift. We can easily solve this SDE to get 

YT-Yo = {fi-^a^)T + aBT 
which implies that Yt is normally distributed with mean Yo + in- la'^)T and variance a 1 : 

YTr^N(Yo + {i^-^a^)T,a^T 

Thus the log of the stock price is normally distributed. We say that stock prices are 
lognormally distributed (in the Black-Scholes model) . 

Definition 12.3.1 A random variable X is said to be lognormally distributed if and only 
if the random variable Y = InX is normally distributed. Equivalently, if X is normally 
distributed, then is lognormally distributed. 

□ 



PDEs in Finance 



233 



The density function of a lognormal VEiriable 

Suppose that X ~ N {/ix , aj^) , so that X has density 

1 



fxix) = 



(x-iix?l2a\ 



Let Y = e^, so that Y is lognormally distributed. Let Fy, fy be, respectively, the distribution 
and density functions of Y. Then 



Differentiating, 



Thus: 



Fy{y) = ¥{Y <y)= P(X < Iny) = Fx(lny) 



fy{y) = F!y{y) = Fj,(lny)^ = ^/x(lny) 



The density of a lognormal random variable Y is given by 



1 1 



fy{y) = \ sl^x^ 





exp 



(Int/ - nx] 
2a\ 



if y >0 



if y<0 



where y = InX and X ~ N{^x,(^x)- Moreover, the mean 
fly and variance ay of Y are given by 



The statements about the mean and variance of a lognormal random variables are left as 
straightforward exercises in integration. For example, you must show that 

1 fOO 1 (lni/-Ma:)^ , 

E[rl = , / -e ^-l dy = e'^-+2<^- 
^ ^ ^/2^Jo y 

We can now prove the following important fact: 

Theorem 12.3.2 Suppose that Y is lognormally distributed, where \nY ^ iV(m, s^). Let K 
he a positive constant. Then 

P(y >K) = N{d-) 
E[max{y - K,0}] = E[Y]N{d+) - KN{d-) 



where 



d± 



ln[E{Y)/K] ± 



234 



The Black-Scholes Model 



Proof: Since InF ~ N{m,s'^), it follows that if we define X = then X ~ iV(0,l), 

i.e. X is a standard normal random variable. Clearly 

P(y >K) = F{lnY > InK) 



= 1-N 
= N 



s 

InK — m 



s 

m — lnK 



where N(x) is the distribution function of a standard normal random variable, and we used 
the fact that 1 - N{x) = N{-x). 

But we know that E[Y] = e"*+^*^ so that m = lnE[y] - |s^. We thus obtain 



P(y >K) = n\ — — J = iV(d_) 

as required. 

Now E[max{y — 0}] is an integral which can be split up into two parts. In the first 
region, F > so that max{y — iC, 0} = Y — K (in that region). In the second region, 
y < K, so that max{y - if , 0} = 0. Thus 



/•oo 

E[max{y - K, 0}] = / (y - K)/(y) 

JK 



dy 



where /(y) is the density function of Y. It is simpler to work with X, however, so we change 
variables: Put x = lilM. Then y = e*^+™, and 

/•oo 

E[max{y - K, 0}] = E[max{e"^+"* - K, 0}] = / (e^^+"* - K)g{x) dx 

J{\xiK-m)/s 

where g{x) is the density of the standard normal random variable X. We can split this up 
into two integrals: 

We simplify the integrand of the first integral by completing the square: 

V 27r 



J_g-(x2-2sa;+s2)/2gm+sV2 



/27r 

n+s 

= E[Y]g{x - s 



e^+'^/^g{x - s) 



PDEs in Finance 



235 



where we used the fact that E[y] = e"*+*^/^. Thus the first integral becomes 

/■OO /"OO 

/ e^^+"'5(x) dx = E[y] / g{x - s) dx 

J {In K-m)/s J {In K-m)/s 

and the g{x — s) dx is just the probability that a standard normal random variable is 
greater than a — s, which is 1 — N{a — s) = N{s — a). Thus 

/ e^^+"*5(x) dx = E[Y]N s = E[Y]N{d+) 

J{\nK-m)/s \ * / 

using m = lnE[y] — 

Similarly, but rather more easily, it can be shown that 

POD 

-K I g{x) dx = -KN{d-) 

J{\nK-m)/s 

and this completes the proof. 



The distribution of asset prices 

We have 

dSt 

= fjidt + a dBf 

which we solved to obtain 

ln5t - iV (in So + {n - ^a^)t,aH 
Thus St = e^, where X ~ N{iJ,x, aj^) and 

fix = In So + (/X - ^o-^)t 



So the density of St is 



[lnS-(lnSo + (p-lCT^)t)P 



□ 



and 

Replacing /x with r will give the density of St in the risk-neutral world. 

Example 12.3.3 Consider a (long) forward contract F on an asset with forward price K = 
Soe^^. The payoff of F at T is St — K. Thus the value of the contract today is 

Fo = e-'-^W.RN[ST - K] 



236 



Option Pricing: The Black-Scholes Formula 



In the risk-neutral world, the asset price dynamics are given by 

^ = rdt + adBt i.e. 5^ = 

Thus EjinISt] = Sqc^'^, and so the value of the forward contract is 

Fo = e-^^[5oe^^ - K] = 

which is the correct value obtained by the (presumably familiar) static replication argument. 
If we had used the "real world" drift however, we would have obtained 

and this is incorrect. 

□ 

Exercise 12.3.4 Recall that the value of a forward contract at time t is 

Ft = [St - Soe"-'] 

Show that Ft satisfies the BS PDE with boundary condition Fq = 0. 
By the results in the previous section, we have 

So iJ, is the expected rate of return of the asset S. In particular, in the risk-neutral world 
the drift of every traded asset is fi = r, so in the risk-neutral world all assets have the same 
expected rate of return r (which we already knew). 

Also, E[(^)^] = dt, which shows that dt is the variance of returns over a period 
dt. We can thus interpret a to be the standard deviation of the returns on S over a period of 
one year. 



□ 



12.4 Option Pricing: The Black— Scholes Formula 

Now that we have the density function of the asset price St in the risk-neutral world, we can 
price practically any European claim V with payoff $(S't): 

Vo = c-'-'^ErnMSt)] 

^-rT rCO [lnS-(lnSo + (r- l^^^^^p 

$(5)e — 



V27rcr2r 7o 



It is easy to evaluate this integral numerically, using Simpson's method, for example. 

Consider next a call option with strike K and maturity T. In this case, $(5't) = maxjS'T- 
K,{)}. Thus: 

Co = e-''^Eijjv[max{5r - K, 0}] 



PDEs in Finance 



237 



Now in the risk-neutral world, St is lognormally distributed, with hi St ~ A^(lnS'o + (r 



la'^)T,a^T). By Theorem 12.3.2, therefore. 



Efljv[max{Sr - K, 0}] = ERN[ST]N{d+) - KN{d-) 



where 



d± 



ln[E{ST)/K] ± ^a'^T 



But Eri^ISt] = Sqc^ , and thus (remembering to discount): 



Co = SoN{d+) - Ke-''^N{d 



where 



and N{x) is the distribution function of a standard normal 
random variable, i.e. 



N{x) = ^ r e-*'/2 dt 



The normal distribution function N{x) can be determined from tables, or by using the 
Excel function NORMSDIST. 



238 



Option Pricing: The Black-Scholes Formula 



X 


0.00 


0.01 


0.02 


0.03 


0.04 


0.05 


0.06 


0.07 


0.08 


0.09 


U.U 


U.DUUUU 


u.ouoyy 


u.ou#yo 


A C1 1 CV7 

u.oiiy f 


A CI CAE: 




A COQnO 


A COTO 


A CQ1 QQ 


A CQCQA 


0.1 


r\ r~ o r\ o o 

0.53983 


0.543795 


0.54776 


0.55172 


0.55567 


0.55962 


0.56356 


A r c T VI A 

0.56749 


0.57142 


0.57535 


0.2 


0.57925 


0.58317 


0.58706 


A rrAAAir 

0.59095 


A rr A ji AA 

0.59483 


0.59871 


0.60257 


A cr\c A A 

0.60642 


f\ C-t A AC 

0.61026 


0.61409 


0.3 


0.61791 


0.62172 


0.52552 


A CAAOA 

0.52930 


0.63307 


0.63683 


0.64058 


A £ il VI A -1 

0.64431 


f\ C A AAA 

0.64803 


0.65173 


0.4 


0.65542 


A CrAAA"7 

0.659097 


0.66276 


0.6664 


0.67003 


0.67364 


0.67724 


A £OAOA 

0.68082 


A CO vl A A 

0.68439 


A CA"7AA 

0.68793 


0.5 


0.69145 


A /' f\ A A"7 

0.69497 


A AA A ~7 

0.69847 


A "7A1 A A 

0.70194 


0.7054 


A ^ A A A >1 

0.70884 




0.71566 


A "7-1 AA A 

0.71904 


0.7224 


0.5 


0.72575 


A ^AAA^ 

0.72907 


A ^ o A n ^ 

0.73237 


0.73565 


A ^ A A A 1 

0.73891 


0.74215 


0.74537 


0.74857 


0.75175 


0.7549 


0.7 


0.75804 


0.76115 


A ^ A A 

0.75424 


0.7573 


0.77035 


0.77337 


0.77637 


0.77935 


0.7823 


A T A n" A ^ 

0.78524 


0.8 


A "700 1 A 

U.7ool4 


A "7 A 1 AO 


A "7AQOA 

0.79309 


0.79573 


0.79955 


A OAAO A 

0.80234 


A OA IT 1 1 

0.80511 


A 0A"70C 

0.80785 


A 1 A C "7 

0.81057 


A 1 Q A"7 

0.81327 


0.9 


0.81594 


A O T O C A 


A OA1 A 1 

O.o2121 


A 0A001 

0.82381 


A OACOA 

0.82539 


A AO A /I 

0.82894 


A Q 1 A~J 

0.83147 


A OOOAO 

0.83398 


A AC Vl C 

0.83645 


A OAOA1 

0.83891 


1 


0.84134 


0.84375 


A A ^ /" T A 

0.84614 


A A ^ O /I A 

0.84849 


A A r~ A A A 

0.85083 


0.85314 


0.85543 


0.85759 


A 1~ A A n 

0.85993 


A /" A 1 A 

0.86214 


1.1 




0.8665 


U.oboD4 


n o ~7 A~7c; 


n ~7 A c 

\j.Q( lob 


A ~7 /I A 


A o~7c; AO 

U.o /byo 


0.8790 


0.8810 


A OOAAO 


1.2 


0.88493 


0.88685 


A AnA"7"7 

0.88877 


A AAACI" 

0.89065 


A AAA F" 1 

0.89251 


A A A yl A r 

0.89435 


A A A£ -1 —} 

0.89617 


0.89796 


A AAA"7A 

0.89973 


A AA-I A —7 

0.90147 


1.3 


0.9032 


A AA A A 

0.9049 


A AAdro 

0.90658 


A AAOA A 

0.90824 


A AAAOA 

0.90988 


A A-l -1 il A 

0.91149 


A A-1 AAA 

0.91309 


A A-l ACC 

0.91466 


0.91621 


0.91774 


1.4 


U. 91924 


0.92073 


0.9222 


0.92364 


A AOCA"7 

0.925U7 


0.92d47 


A AA"70C 

0.92785 


A AAAAA 

0.92922 


0.93055 


A AO 1 OA 

0.93189 


1.5 


A A O O 1 A 

0.93319 


A A O ^ VI A 

0.93448 


0.93574 


A A O A A 

0.93599 


A A A A A A 

0.93822 


A A A A ^ A 

0.93943 


A A i1 Aj:; a 

0.94052 


A A VI 1 TA 

0.94179 


A A ^ A A r" 

0.94295 


A A vl VI A A 

0.94408 


1.5 


0.9452 


0.9453 


A A ^ ^ O O 

0.94738 


A f\ A A r 

0.94845 


0.9495 


A A n" A 1" A 

0.95053 


0.95154 


A A r" A I" VI 

0.95254 


A A f" A n A 

0.95352 


A A r" VI ^ A 

0.95449 


1.7 


0.95543 


0.95537 


A A r" ^ A A 

0.95728 


A A r" A -1 A 

0.95818 


A A r A A^ 

0.95907 


A A r A A ^ 

0.95994 


0.9608 


0.96154 


A A/~ A Vl /" 

0.95245 


A A/~ 1 AT 

0.95327 


1.8 


0.95407 


0.96485 


0.95562 


0.95538 


0.96712 


0.96784 


0.96856 


0.96926 


0.95995 


0.97052 


1.9 


0.97128 


A A~71 AT 

0.97193 


0.97257 


0.9732 


A A"7AA1 

0.97381 


A A^ A A^ 

0.97441 


0.9750 


0.97558 


0.97615 


0.9767 


2.0 


0.97725 


0.97778 


A A"7A'!)-| 

0.97831 


A A"7AOA 

0.97882 


A ATAAA 

0.97932 


A ATAAA 

0.97982 


A A A A A 

0.9803 


A AAA"7"7 

0.98077 


A A A 1 A VI 

0.98124 


A AA 1 /T A 

0.98169 


2.1 


A A A 1 A 

0.98214 


0.98257 


A A AO A 

0.9830 


A AAA A -t 

0.98341 


A AAAAA 

0.98382 


A AA A AA 

0.98422 


A AA A C-i 

0.98461 


0.9850 


A AACA"? 

0.98537 


A A A r T ^ 

0.98574 


2.2 


0.9851 


0.98645 


0.98679 


A AA"71 A 

0.98713 


A A A^ jl r 

0.98745 


A AA"7"7A 

0.98778 


A AAAAA 

0.98809 


A AAA A 

0.9884 


0.9887 


A AAAAA 

0.98899 


2.3 


A AOAAO 

0.9o92o 


0.9o95D 


A AOAOO 


0.9901 


A AAAO^: 

0.99036 


A AAA<;1 

0.99061 


A AAAO£ 

0.99086 


0.99111 


A A A1 yl 

0.99134 


A AA1 CO 

0.99158 


2.4 


0.9918 


0.99202 


A AAAA A 

0.99224 


0.99245 


0.99266 


A AAAAC 

0.99286 


A AAAAC 

0.99305 


A AAAAvl 

0.99324 


A AAA A A 

0.99343 


A AAAd 

0.99361 


2.5 


A AAO^A 

0.99379 


A AAOA/T 

0.99395 


A AA ^ 1 A 

0.99413 


0.9943 


A A A ^ ^ C 

0.99445 


0.99461 


A A A ^ I / 

0.99477 


A AAvlAA 

0.99492 


A AACAC 

0.99506 


0.9952 


2.6 


A A A n o ^ 

0.99534 


A A A r" VI ^ 

0.99547 


0.9956 


A A A r" ^ o 

0.99573 


A A A l~ A r~ 

0.99585 


A A A r~ A A 

0.99598 


A A A C A A 

0.99509 


A AAiTA-l 

0.99621 


A AACAA 

0.99632 


A A A /" ^ A 

0.99643 


2.7 


A A A/~ r~ o 

0.99553 


0.99564 


0.99674 


A A A/~ A A 

0.99583 


A A A/~ A A 

0.99593 


A AATAA 

0.99702 


0.99711 


0.9972 


A AA"7AA 

0.99728 


A A A T A /" 

0.99735 


2.8 


A AA^ A A 

0.99744 


0.99752 


0.9976 


0.99767 


A A A^^ Vl 

0.99774 


A A ATA -1 

0.99781 


A A AT A A 

0.99788 


A AATAn 

0.99795 


A AAAA1 

0.99801 


A AAOAT 

0.99807 


2.9 


A A AO 1 O 

0.99813 


A A A A -1 A 

0.99819 


A AAOAr* 

0.99825 


A A A A A 1 

0.99831 


A A A A 1 C 

0.9983d 


A A A A VI -1 

0.99841 


A A A A vl /~ 

0.99846 


A A A A r" 1 

0.99851 


A A A A n /" 

0.99856 


A A A AC 1 

0.99861 


T n 
j.U 


U.yyoDo 


u.yyooy 




A OQO7O 


n OQSQO 

u.yyoo^ 




u.yyooy 


u.yyoyo 


u.yyoyo 


u.yyyu 


3.1 


0.99903 


0.99906 


0.9991 


0.99913 


0.99916 


0.99918 


0.99921 


0.99924 


0.99926 


0.99929 


3.2 


0.99931 


0.99934 


0.99936 


0.99938 


0.9994 


0.99942 


0.99944 


0.99946 


0.99948 


0.9995 


3.3 


0.99952 


0.99953 


0.99955 


0.99957 


0.99958 


0.9996 


0.99961 


0.99962 


0.99964 


0.99965 


3.4 


0.99965 


0.99968 


0.99969 


0.9997 


0.99971 


0.99972 


0.99973 


0.99974 


0.99975 


0.99976 


3.5 


0.99977 


0.99978 


0.99978 


0.99979 


0.9998 


0.99981 


0.99981 


0.99982 


0.99983 


0.99983 


3.5 


0.99984 


0.99985 


0.99985 


0.99985 


0.99985 


0.99987 


0.99987 


0.99988 


0.99988 


0.99989 


3.7 


0.99989 


0.9999 


0.9999 


0.9999 


0.99991 


0.99991 


0.99992 


0.99992 


0.99992 


0.99992 


3.8 


0.99993 


0.99993 


0.99993 


0.99994 


0.99994 


0.99994 


0.99994 


0.99995 


0.99995 


0.99995 


3.9 


0.99995 


0.99995 


0.99996 


0.99995 


0.99996 


0.99996 


0.99996 


0.99996 


0.99997 


0.99997 


4.0 


0.99997 


0.99997 


0.99997 


0.99997 


0.99997 


0.99997 


0.99998 


0.99998 


0.99998 


0.99998 



Table for N(x) when a; > 0. 



Notes: 

(1) Note that N[-x) = 1 - N{x). 

(2) Use hnear interpolation to calculate the values of N{x): For example, 

A^(0.8625) = Af(0.86) + 0.25(0.87 - 0.86) « iV(0.86) + 0.25[iV(0.87) - iV(0.86)] 



PDEs in Finance 



239 



Exercise 12.4.1 Verify that the formula for the call option above is a solution to the BS 
PDE. Begin by showing that 

A: — = 



■ is = ^(^-^^ 

d^C _ N'{d+) 



dt 2Vf 

□ 

The partial derivatives in the above exercise are known as the Greeks, and are measures 
of the sensitivity of an option to its parameters. Other Greeks are 



dC 
dr 

Vega: — 



We can obtain the formula of a put option in the same way that we derived the formula 
for a call option, i.e. via a long and complicated chain of integrations. However, it is more 
intelligent to use put-call parity: 

Po = Co + Ke-"-^ - So 

= So[N{d+) - 1] + Ke-^^[1 - N{d.)] 
= -SoN{-d+) + ife-^^iV(-(i_) 

i.e. 



-SoN{-d+) + Ke-'^Ni-d. 



Finally, as a curiosity, we mention binary or digital options: 



Definition 12.4.2 • A binary call on S with strike K will pay one unit of currency if 
St > K &t expiry, and nothing otherwise. 

• A binary put will pay 1 if St < -f^, and nothing otherwise. 

□ 

Let Be be a binary call with strike K. The boundary value problem for Be is 

dBc 1 2n2d^Bc ^dBc „ ^ 

Bc{T) = I{St>k} 



240 



Option Pricing: The Black-Scholes Formula 



Exercise 12.4.3 (1) The solution is of this BVP is given by 

5e(0) = e-'^N{d-) 

where as before 

^__ lnf + (r-|a^)r 

(2) You should be able to obtain a put-call parity for binary options, and then deduce the 
value of a binary put from that. 

□ 



Chapter 13 

Introduction to PDEs 



In this chapter we examine what partial differential equations (PDEs) are, how they can be 
classified, and what is meant by a solution of a PDE. 



13.1 What is a PDE? 



An ODE is an equation involving a function of one variable and its derivatives. A PDE 
is an equation involving a function u{x, y, ■ ■ ■) of more than one variable, and its (partial) 
derivatives. They are used to model not only physical phenomena such as wave otion, heat 
conduction, fluid dynamics, electromagnetism etc., but are also used in the biological and 
economic sciences. In stochastic analysis, Markov processes are closely associated with PDEs. 



Here are a few examples: 



Ut 

Ut 

Utt 

Ux + UUy 

UUx ~\~ Uxxx 

ihut 



= 



Uxx 

Uxx "I" Uyy + Uzz 



Uxx ~l~ Uyy + Uzz 




ft2 „. 



2m' 



V{x)u 



Ut + -cr X Uxx + rxux — ru 

Ut 





Uxx 



+ ru{l — u) 



transport eqn. u = u{t, x) 
ID heat eqn. u = u{t, x) 
3D heat eqn. u = u{t, x, y, z) 
2D Laplace's eqn. u = y(x, y) 
3D wave eqn. u = u{t, x, y, z) 
shock wave u = u{x, y) 
dispersive wave u = u{t, x) 

ID Schrodinger eqn. u = u{t, x) 

Black-Scholes eqn. u = u{t, x) 
Fisher's eqn. u = u{t, x) 



As stated before the unknown function u — ^the dependent variable — is always a function of 
more than one independent variable t,x,y, . . . . Loosely speaking, the dependent variable is the 
one that is being differentiated, whereas the independent variables are those we differentiate 
with respect to. 



241 



242 



Introduction to PDEs 



The general form of a PDE is thus: 

F[u,Uxi, ■ ■ ■ ,'(J'Xni'^xixij'(J'xix2i ■ ■ • j'^xnXnJ ■ • • j'^^xiXjx^j ■ ■ ■ , Other parameters) = 

13.1.1 Types of PDEs 

PDEs can be classified according to the following criteria: 

• Order: The order of a PDE is the order of the highest partial derivative in the equa- 
tion. For example, the transport- and shock wave equations are first order, the heat-, 
wave-, Laplace- and Black-Scholes equations are second order, and the dispersive wave 
equation is third order. 

• Number of variables: The number of independent variables. 

• Linearity: We can write a PDE in operator form Cu = f, where / is a function of just 
the independent variables. For example, the ID heat equation is 

d 92 

jCu = where C := 7— ^ 

ot ox^ 

The dispersive wave equation is 

Cu = where ^ ■■= + i') 9^. + g^s 

When the operator C is linear, the PDE is said to be linear. This means that the 
dependent variable and its derivatives occur linearly. The independent variables may 
occur non-linearly.The heat equation is linear, whereas the dispersive wave equation is 
not — the latter has the non-linear term uuxx- Here are some more examples: 

Ux + sia.{y)uyy = linear-u-ra, + sin('u) = non-linear 

• Homogeneity: A PDE Cu = / is homogeneous if / = identically. Else it is non- 
homogeneous. 

• Kinds of coefficients: The coefficients may be constant or variable. 



13.2 Solutions to a PDE 
13.2.1 Contrast with ODEs 

ODEs usually have few types of solution. Consider, for example, the first-order linear ODE 

y' + 2xy = 2x 

which can be solved by using an integrating factor. The idea is to multiply both sides by a 
function I{x) — the integrating factor — so that the lefthand side reduces to (ly)'. Thus we 
want ly' + 2Ixy = (ly)', from which it follows easily that /' = 2Ix. This yields a separable 
ODE for I, whose solution is 

y = 2a;dx =^ lnJ = a;^ + c ^ J = Ce^^ 



Introduction to PDEs 



243 



We thus take I{x) = and obtain 
so that 

1 _ 2 

y = - + Ce ^ C constant 

These are all the solutions of the ODE (i.e. one for each value of the constant C), and the 
ODE thus determines the nature of the solution. 

The situation is dramatically different for PDEs. Consider the following first order linear 
PDE Ux + Uy = 0. Let / : M — M be any differentiable function, and define u{x, y) = f{x — y). 
Direct computation shows that 

Ux + Uy = f{x -y) - fix -y) =0 

so that u = f{x — y) is a solution of the PDE. Hence each of the following is a solution of the 
given PDE: 

u{x, y) = bx — by 
u{x, y) = {x- yf 

u{x,y) = 

[x - yY 

etc. etc. etc. The nature of the solution is very far from being determined by the PDE, and we 

usually need more information. We shall shortly discuss what sort of additional information 
we require: initial or boundary conditions. But first, we make a short detour: 

13.2.2 First-Order Linear PDEs 

We would like to be able to solve at least a few PDEs before we continue our qualitative 
discussion of the solutions to PDEs, to build intuition and confidence. We therefore show, in 
this subsection, how to solve certain kinds of first-order linear PDEs. We begin by demon- 
strating a generalization of the technique used to solve the PDE Ux + Uy = Q m the previous 
subsection: 

Example 13.2.1 Consider the PDE 

Ux + yuy = 

which is first-order linear and homogeneous. Note that 

Ux + yUy = {dxU dyu) ^ ^ ~ directional derivative of u in direction (1, y) 

So the directional derivative of any solution in the direction (1, y) is zero. 

Consider now a function y = y{x), which determines a curve consisting of points (x, y{x)) 
in the XF-plane. The tangent vector of such a curve at the point (x, y) is (1, y'{x)). Now the 
curves y = y{x) which "point" in the direction (l,y), i.e. which have tangent vector (l,y) at 
the point (x,y), and thus these curves satisfy y' = y. This ODE is easily solved: y{x) = Ce^ 
where C is constant. The curves y = Ce^ are called the characteristic curves of the PDE. On 



244 



Introduction to PDEs 



each such curve, the directional derivative of a solution u is zero, and hence u is constant on 
the characteristic curves. Another way to see this: 

du(x,Ce^) -, ^ r 

= Ux ■ i- + Uy ■ ue = Ux + yuy = U 

It follows that 

u{x, Ce^) = u{0, Ce°) = tt(0, C) which is independent of x 

Given an arbitrary point {x,y), we find C such that y = Ce^, namely C = e^^y. It then 
follows that 

u{x, y) = u(0, e~^y) which is a function of e~^y 
Hence the general solution of the PDE is given by 

u{x, y) = f{e~^y) where / : M — M is differentiable 



□ 



Now try it yourself: 
Exercise 13.2.2 Consider the PDE 



Ux + 2xy'^Uy = 



(a) Show that the directional derivative of u at the point {x,y) in the direction {l,2xy ) is 
zero. 

(b) Deduce that u is constant on the characteristic curves y = q^^i , C constant. 

(c) Conclude that u{x, y) = /(x^ + ^) 

□ 

The above technique works for PDE's of the form a(.T, y)ux + h{x, y)uy = 0. It reduces the 

solution of the PDE to that of an ODE ^ = The solutions of this ODE are the 

ax a(x,y) 

characteristic curves of the PDE, along which every solution is constant. We can extend it to 
homogeneous first-order linear PDEs with constant coefficients, as we now demonstrate by 
example: 

Example 13.2.3 Consider the PDE 

3ux + 2uy — 5u = 

We would like to put it in the form aux + buy = 0, i.e. we need to get rid of the tt-term. 
Define a new function 

v{x,y) = e~3^u{x,y) 

Then Vx = e~^^\ux — \u] and so 

Zvx + 2vy = e~3^[3ux — 5u + 2uy] = 
We thus — this is an exercise! — solve the PDE 3vx + 2vy = as before to obtain v{x, y) 



f{y — \x), and hence the solution of the original PDE is 



u{x,y) = el^'fiy- fx) 
as may easily be verified by direct differentiation. 



Introduction to PDEs 



245 



□ 



Another technique for solving first-order linear involves changing coordinates: 
Consider the PDE 

aux + buy + g{x, y)u = 

Define new coordinates 

x\ {a b\ {x\ , {x\ 1 

so that 



kUJ \^ \y J \y J + h'^ \h —a) \y 

Now by the chain rule for differentiation 

Ux = aUx + buy Uy = bux — auy 

so the PDE becomes 

u, + -g{x,y)u = Q where y) = ^^ifjg, g^) = 

Thus the new PDE looks like a first-order linear ODE of the form v'+fy{x)v = with solution 
v{x) = h{y)e~ fy^^^ where h is an arbitrary differentiable function of the "parameter" y 
only. Thus we get 

u = h{y)e- ^ '^^ 
which we can transform to (x, y)-coordinates to obtain u. 
Example 13.2.4 We solve the PDE 

Ux + 2uy + {2x — y)u = 
New coordinates are x = x + 2y,y = 2x — y, and the PDE becomes 

Ux + lyu = 

which can be solved to obtain 



1 2a: +3a;j/ 2y 

u = h{y)e'^^^ = h{2x — y)e s h arbitrary 



□ 



Example 13.2.5 We solve the PDE 

Ux + 2uy + (2x — y)u = 2x^ + 3xy — 2y^ 
We have already solved the homogeneous PDE Ux + 2uy + {2x — y)u = in the previous 

-\-3xy 

example, with general solution u{x, y) = h{2x — y)e s . When we move to the 
(x,y)-coordinates described there, the PDE becomes 

Ux + lyu = \xy 

This looks like a first-order linear ODE involving a "parameter" y, with integrating factor 

yx 

I[x) = e~, and yields 

diu r^^ . _ ^. 



246 



Introduction to PDEs 



Integrating by parts, we obtain 



lu = xe^* - ^e^^ + D 



where the "constant" of integration D may involve the "parameter" y. Thus 

u = x- ^ + De'^^ 
Translating back to (x, y)-coordinates, we see that 

c , , 2x'^+3xy-2y'^ 

u[x, y) = X + 2y — + D[2x — y)e b D arbitrary 



□ 



Changes of coordinates will play an important role in the next section, where we take a 
first look at second-order linear PDEs. 

We have now seen that the characteristic curves of a first-order linear PDE are realted 
to a choice of coorinates which simplifies the PDE, and turns it into an ODE. The following 
exercise gives another important interpretation: The characteristic curces of a PDE L[u] = f 
are exactly those curves along which knowledge of the values of u does not determine the 
values of Ux,Uy. We will return to this when we discuss characteristics for second-order 
linear PDEs. 

Exercise 13.2.6 We show that the characteristic curves of a quasilinear first-order PDE 

a{x, y, u)ux + b{x, y, u)uy = c{x, y, u) (*) 

are exactly those curves along which the PDE (*) together with knowledge of the values u 
along the curve is insufficient to determine Ux,Uy. 

6.1 Let r be a curve in the xy-plane, parametrized by a variable t: 

X = x{t) y = y{t) along P 

u is supposedly known along P, so there is a known function / such that 

uixit),yit)) = fit) 

Show that we have the following sustem of linear equations for Ux,Uy: 

x'ux + y'uy = f 
aux + buy = c 

where a = a(x, y, u) = a(x(t), y{t), f{t)) is known, and the same is true for b, c. 

6.2 Show that this system uniquely determine Ux,Uy, unless 

x'b -y'a = (f) 



6.3 Show that (f) holds precisely when b dx — a dy = along P, i.e. precisely when P is a 
characteristic of (*). 



Introduction to PDEs 



247 



□ 

We end our detour with one more example, after which we will continue our qualitative 
discussion of solutions of PDEs. 

Example 13.2.7 The one-dimensional wave operator (or D'Alembert operator) ^ ~'^'§^ 
f actor izes: 

dl - ^dl, = {dt + cd,){dt - cd,) 

As a result, it is easy to find the general solution of the wave equation uu — (?Uxx = 0: If n 
is a solution, then 

V := {dt — cdx)u is a solution to vt + cvx = 

This is a homogeneous first order linear equation, with general solution v = h{x — ct), where 
^ is an arbitrary function. We thus obtain [dt — cdx)u = v, which is an inhomogeneous first 
order linear equation. The corresponding homogeneous equation ut — cu^ = has general 
solution g{x + ct), where g is arbitrary. If we try to find a particular solution of the form 
t) = f{x — ct) to the inhomogeneous equation ut — cux = v, we find that 

f'{s) = ^ and so f{s) = ^J' h{s) ds 

But since h, f is arbitrary, i.e. 

ui{x,t) = f{x + ct) f an arbitrary function 

If u is another solution to the inhomogeneous equation ut — cux = h, then (by linearity) u — u\ 
is a solution to the homogeneous equation, i.e. u — ui = g{x + ct), where g is arbitrary. It 
follows that 

u{t, x) = f{x + ct) + g{x — ct) 

where /, g are arbitrary. This should be verified by direct differentiation. 

Note that f{x + ct) can be interpreted as the graph of f{x) moving to the left at speed 
c, whereas g{x — ct) is the graph of g{x) moving to the right at speed c. The parameter c in 
the wave equation uu — c'^Uxx can therefore be interpreted as the speed of the wave. 

□ 

13.2.3 Initial- and BoundEiry Conditions 

There is no general mathematical theory for solving partial differential equations, and one 
can seldom solve the PDE in general, as we were able to do for the first-order PDcs and 
one-dimensional wave equation in the preceding subsection. Instead, research has focussed 
on particular classes of PDEs, often suggested by an applied/physical problem. Different 
techniques have developed for solving different classes of PDEs. And certain auxiliary condi- 
tions, also suggested by the applied/physical problem, are imposed as well. These are of the 
following types: 

Initial Value Problems (IVP) As suggested by the name, these specify the state at an 
initial time, usually t = 0. IVP are also called Cauchy problems. 



248 



Introduction to PDEs 



Examples 13.2.8 1. Consider the one-dimensional heat equation 

ut = kuxx k > 

which governs the evolution of the temperature in an infinitely long homogeneous rod: 
u{t, x) is the temperature at point x at time t. This PDE has many solutions. But using 
physical intuition about temperatures, one would guess that the temperature at times 
t > is completely determined by the temperature at t = 0, i.e. that there is just one 
function u which satisfies the IVP 



= Initial Value Problem 



Ut = kuxx PDE + 
u{0,x) = ^{x) Initial Condition 

2. Consider the one-dimensional wave equation 

Utt = C^Uxx 

which governs the evolution of a vibrating string: u(t, x) is the displacement at point x 
at time t. Physical intuition suggests that we need two initial conditions to specify the 
solution uniquely — not just the initial displacement, but also the initial velocity: 

Utt = c^Uxx PDE + 1 
u{0,x) = IC + > = IVP 

nt(0,x) = ^'(x) IC J 

In these examples, we see from physical grounds that we need different kinds of data to 
find unique solutions to these PDEs. (This must, of course, also be proven mathematically.) 

□ 



Boundary Value Problems (BVP) Given a particular applied problem modelled by a 
PDE, the PDE is usually only valid in a certain open set U, and the nature of the problem 

may require that the solution u satisfy certain auxiliary conditions on the boundary dU of 
U. In practice, these boundary conditions are of three types: 

I. Dirichlet conditions specify the value of the solution u on the boundary: 

u = f on dU 

II. Neumann conditions specify the value of the outward normal derivative |^ := Du ■ n on 
the boundary dU. (Here, n is a generic unit outward pointing vector which is normal 
to the boundary dU.) 

= f on dU 



dn 

dn 



III. Robin conditions specify the value of §^ + au on the boundary dU. Here, a is a function 



of the independent variables t, x, y, z 

du 



„ , au = f on dU 
on 



Introduction to PDEs 



249 



Examples 13.2.9 1. Consider a vibrating string of length L. The appropriate PDE, the 
one-dimensional wave equation utt = c^Uxx is valid on the open interval U = (0, L), whose 
boundary consists of just the two endpoints: dU = {0, L}. 

• If both ends of the string are held fixed, then we have homogeneous Dirichlet boundary 
conditions: 

u{t, 0) = u{t, L) = alH > 

• Suppose now that the x = end is fixed, but that the x = L end is able to freely 
slide up and down along a frictionless track. Then we have 

0) = Ux{t, L) = alH > 

i.e. a Dirichlet condition holds on the left end, and a Neumann condition on the right 

— Note that the outward normal derivative is just the ordinary partial derivative 

du 

dx ' 

• If the a; = end is fixed and the x = L ends is able to move up and down along 
a frictionless track, but is attached to a spring or rubber band (satisfying Hooke's 
Law), then 

u{t,0) = Ux{t,L) + ku = alH>0 
i.e. a Dirichlet condition on the left boundary, and a Robin condition on the right. 

2. Suppose that an object occupies a region of space whose interior is the open set U. The 
evolution of the temperature is governed by the heat equation ut = Uxx + Uyy + Uzz- 

• If the object is immersed in a heat reservoir of temperature g{t), we would have 
Dirichlet boundary conditions: u{t, x) = g{t) for x G dU. 

• If the object is perfectly insulated, then no heat flows accross the boundaries, i.e we 
have Robin boundary conditions: |^ = 0. 

• If the object is immersed in a medium at temperature Tq, then an inhomogeneuous 
Robin boundary condition would hold at that end: |^) = —k{u — Tq). 

□ 



Initial— Boundary Value Problems (IBVP) Many problems involve both initial and 
boundary conditions. For example, if a rod of length L is insulated along its length, has 
an initial temperature ^{x), has the x = 0-end kept at a temperature g{t) and the x = L- 
end immersed in a reservoir of temperature Tq, then we have the IBVP 

— '^xx 

u(0, x) = <^{x) 
u{t,0)=g{t) 
Ux{t,L) = -k{u{L,t) 




250 



Introduction to PDEs 



13.2.4 Well-Posed Problems 

As said before, PDEs can have many solutions. Consider the following example: 

Example 13.2.10 Consider a homogeneous rod of length L and let u{t, x) be the tempera- 
ture at time t at point x. The x = end of the rod is kept at ° C and the x = L end at 100 

°C. Suppose now that the rod has reached equilibrium, i.e. that the temperature is not chang- 
ing over time. Then u{t,x) =: U{x) is a function of just x. To determine the temperature 
U {x) of the rod at point x in equilibrium we must solve the following BVP: The heat equation 
ut = kuxx becomes = Uxxi and the boundary conditions become U{G) = 0,U{L) = 100. 
Thus 

Uxx = on (0, L) subject to U{0) = U{L) = 100 



Now 

Ui{x) :-- 



(5 if < X < L 

iix = 

100 if a; = L 



is clearly a solution to the BVP. So is 

' 20x + 72 if < X < L 
U2{x) := I if X = 

100 if X = L 

V 

Yet physical intuition tells us that the correct solution is 

lOOx 

U (x) 



L 

Ui, U2 are pathological: We have simply taken arbitrary solutions of the PDE and pasted the 
boundary conditions onto them. Note that Ui,U2 arc continuous on (0, L) but not on [0,L]. 
U, however, is continuous on [0, L]. Moreover, it is easy to see that this is the only solution 
which is continuous on [0, L]. Thus by demanding, for this problem, that the solution is 
continuous to the boundaries, we guarantee a unique solution. 

This suggests that we should require some regularity in our solutions. 

□ 

An initial and/or boundary value problemis said to be well— posed if and only if: 

(i) Existence: There is at least one solution. 

(ii) Uniqueness: There should be no more than one solution. 

(iii) Continuous dependence on the data: Small changes in the data produce only small 
changes in the solution. 

(i) and (ii) allow us to talk about the solution (as opposed to a solution), and (iii) is a require- 
ment if we want to be able to calculate the solution: Calculation often involves approximation, 
e.g in terms of limiting processes such as integrals, series expansions, etc. 
Related to (i)-(iii) above are three "practical" questions: 



Introduction to PDEs 



251 



(i') How do we construct a solution? 

(ii') How regular (e.g. continuous, differentiable) is the solution? 

(iii') If an exact analytic formula for a solution cannot be found, how do we approximate the 
solution? 

Remarks 13.2.11 The context of the problem determines what kind of extra regularity con- 
ditions we should impose on the problem. In the example of the rod in thermal equilibrium, 

above, wc imposed continuity on the boundaries to obtain a unique solution. One might think 
that, e.g. we should always require that an n^^ order PDE be n-times differentiable on the 
interior of its domain. This is not true, as PDEs can also be used to model inherently dis- 
continuous phenomena, such as shock waves, and mathematicians have come up with various 
kinds of objects that may count as a solution - not only n-times differentiable functions — by 
interpreting the PDE for these objects in different ways. The bottom line is that the context 
of the PDE (i.e. the problem the PDE is trying to model) is always important. 

□ 

For a PDE together with some auxiliary conditions, such as initial- and/or boundary condi- 
tions, to be well-posed, there must be a delicate balance. There can't be too few auxiliary 
conditions, or the solution will not be unique (e.g. the vibrating string where the initial dis- 
placement is given, but not the initial velocity) . There can't be too many auxiliary conditions, 
or they will "clash" (i.e. be mutually inconsistent), and there won't be a solution. We will 
discuss the well-posedness of certain classes of PDEs later in the course. 

13.3 Classification of Linear Second— Order PDEs 

It is an interesting fact that many phenomena may be modelled — at least as a first approxi- 
mation — by linear second order PDEs. The heat, wave and Laplace equations are canonical 
examples of 2nd order linear PDEs. The Laplace operator A = ^ + • • • + -i-r plays an 



important role. We recall the two-dimensional versions, with A = 




Heat equation : ^ — Au = 
Wave equation : ^ — Au = 
Laplace's equation : Au = 

The general second order linear PDE has the form 

n n 

aijUxiXj + XI ^i'^^i + cu = d 

i,j=l i=l 
principal part 

where the a^j, bi,c,d are functions of the independent variables Xi only. Because Uxixj = Uxjxi, 
the coefficients of the principal part may always be arranged so that Ojj = aji, i.e. so that 
the matrix A := (aij) is symmetric. The linear second order differential equations may be 
further classified into three groups, according to the properties of the matrix A. 
Recall the following spectral decomposition theorem from linear algebra: 



252 



Introduction to PDEs 



Theorem 13.3.1 Every symmetric matrix has an orthonormal basis of eigenvectors. 

□ 

Remarks 13.3.2 1. Suppose that A is symmetric, with orthonormal basis of eigenvectors 
Vj, so that Avi = AjVj for eigenvalues Aj. Consider the matrix O whose columns are the 
eigenvectors Vj, i.e. Oij = vji. Then 

{0'^0)ij = ^ OllOkj = VikVjk = (vi, vj) = 5ij 

k k 

i.e. O*'' = 0~^. Now AO = OD[X], where D[X] is the diagonal matrix with eigenvalues 
along the diagonal, i.e. -D[A]ij = SijXi. It follows that 

0*M0 = D[X] 

i.e. that A is diagonalizable. 

2. Now recall that a symmetric matrix is said to be non-negative definite (or positive semidef- 
inite) if and only if 

x*Mx > for all x 

This will be the case precisely when all A's eigenvalues are non-negative. For if A^ > for 
all i, and if x = ajVj is arbitrary, then 

x*Mx = aiVi^AiY^ ajVj) = aiajXjSij = ^ af A^ > 

i j i,j i 

Conversely, if A is non-negative definite, then 

0<vtAvi = Xi\\vi\f 

so that each eigenvalue Aj is > 0. 

Similar remarks can be made for strictly positive definite and for negative definite matrices. 

3. An orthogonal transformation B = O^^ AO does not change the "dcfinitcncss" of a matrix, 
because B and A have the same eigenvalues: If v is an eigenvector of A with corresponding 
eigenvalue A, then 0*''v is an eigenvector of B, with the same eigenvalue. (Note that B is 
automatically symmetric if A is.) 

□ 

Suppose we have a 2nd order linear PDE 

n n 
i,j=l i=l 

Define new variables (r = 1, . . . , n) by a linear change as follows: 

= 'Y^PirXi i.e. ^ = P*^x where P = {pir) 



Introduction to PDEs 



253 



Then 




r 



d_ 

Wr 



and similarly 



^2 




92 



dxidxj 



s 



r,s 



In the new coordinates, the principal part of the given PDE therefore takes the form 



i.e. the principal part has matrix C = P^^AP. 

By ensuring that C is a diagonal matrix, we can remove the mixed partial derivatives 
■"?r6'^ 7^ particular, it follows from the preceding remarks that if we take P = O, the 
matrix whose columns form an orthonormal basis of eigenvectors, then C = O^^AO = D[X], 
i.e. 



Hence any second order linear differential equation can have its principal part reduced to 
diagonal form — i.e. involving no mixed partial derivatives — by a linear change of variables. 
We now classify the 2nd order linear PDEs as follows: 

I. If A is strictly positive definite or strictly negative definite, i.e. if all the eigenvalues are 
non-^ero and have the same sign, we say that the PDE is elliptic. 

Laplace's equation is the canonical example of an elliptic PDE. 

II. If A has a zero eigenvalue, i.e. if dct(A) = 0, then the PDE is parabolic. 
The heat equation is the canonical example of a parabolic equation. 

III. If all the eigenvalues are non-zero and one has sign opposite to all the others, we say 
the PDE is hyperbolic. 

The wave equation is the canonical example of a hyperbolic equation. 

IV. Any other type is called ultrahyperbolic — but these do not occur often in practice, and 
only in dimension n > 4. 




r,s 




r,s 



r 



□ 



In stochastic analysis and finance, it is the second order parabolic equations which occur 
most frequently. 



254 



Introduction to PDEs 



Exercise 13.3.3 Consider a second order linear PDE 



■^Uxx + ^BUxy + CUyy + Dux + EUy + F = G 

Show that the PDE is 

(a) EUiptic if B"^ - AC < 0; 

(b) ParaboHc if B"^ - AC = 0; 

(c) Hyperbolic ii B'^ - AO Q. 



Remarks 13.3.4 The names ellipfAc, parabolic and hyperbolic arc adopted from conic sec- 
tions. For example, an ellipse has basic equation ax^+by"^ = const., with a, 6 > 0, and this can 
always be reduced to x^+y^ = const, by a change of variables — just take x := -y/ax, y := Vby. 
So after a change of variables, an elHpse becomes a circle. Now the expression x'^ + is for- 
mally analogous to Uxx + Uyy Indeed 

+ = {x y) whereas Uxx + Uyy = {dx dy) {^q^ 

So Laplace's equation Uxx + Uyy = is said to be elliptic. 

Similarly, the basic equation of a parabola is y — = const., and hence the heat equation 
Uy — Uxx = is said to be parabolic. Hyperbolas have two forms for the basic equation: 
xy = const, and — = const. (If you are only familiar with the first form, note that the 
change of variables x = y = transforms xy = const, to x^ — y^ = const.) Hence the 
PDEs Uxy = and Uxx — Uyy = 0, both forms of the wave equation, are said to be hyperbolic. 

□ 



This analogy between second order linear PDEs with constant coefficients and conic sec- 
tions carries over to more complicated situations: 

Example 13.3.5 Consider the the PDE 

^Uxx — 24Uj;j/ + i^Uyy — ISUj, — lOlUy -|- 19 = 

An inspection of the principal part shows that it is parabolic: b^ — ac = 12^ — 9 x 16 = 
0. We show how it can be transformed to a heat equation by a change of variables, and 
simultaneously show the strength of the connection with conic sections. The principal part is 

9Uxx '^^Uxy ~\~ 'i-QUyy — (t^^t^y) 

The principal part of the corresponding conic section is 

9a;2 - 24xy + 16y^ = {x y) (^^^, 

The eigenvalues of the matrix A which determines this quadratic form are easily determined to 
be A = and A = 25, with corresponding eigenvectors (| |)*'' and (— | I)**", both normalized 



/ 9 -12\ dx\ 
1-12 16 i \dy 




Introduction to PDEs 



255 



to have unit length. The orthogonal matrix O whose columns are these eigenvectors is used 
to define new coordinates: 

^j=0*'-Q so that Q=oQ where O = {l ~J 
The equation of the conic section in the new coordinates is therefore 

r])0^''AO - 18(4/5 - 3/5) f - 101(3/5 4/5) f +19 = 



'7/ W. 



I.e. 



25r?2 - 75^ - 70r? + 19 = 



Completing the square shows that this is equivalent to 

(^7-1)^ = 3(^+1) 

and thus we see that in (^, 77)-coordinates the conic section is a parabola. However, the 
(^, ?7)-coordinates were obtained from the (x, y)~coordinates via a orthogonal matrix, i.e. a 
rotation. Since a rotated parabola is still a parabola, we see that the conic section 9a;^ — 
24x + 16?/2 - 18x - lOly + 19 = defines a parabola. 

We can further simplify the equation of the parabola to the form by a translation of the 
axes: We get 

— i = where t := 3(^ + f )> r := rj — ^ 

Repeating all these steps for the PDE, as described before when we discussed removal of 
mixed partial derivatives, we see that in r/)-coordinates the PDE becomes 

25u^^ — 75u^ — 70ur] + 19 = 

which is equivalent to the heat equation when we move to (r, t)-coordinates 

Urr — Ut = 

□ 

Exercises 13.3.6 Classify and transform the following PDEs so that the principal part is in 
canonical form: 

1. Uxx - Uxt - Utt - Ux - Ut = 

2. 2uxx + Uxt + 2utt + n = 

3. -3uxx + lO-Uxt - 3utt + 7ux -ut + u = 

□ 



The above technique shows how to reduce the principal part of a second-order linear PDE 
with constant coefficients to canonical form. A simple trick allows us to remove the lower 
order derivatives: 



256 



Introduction to PDEs 



Exercise 13.3.7 (a) Given 

show that a tranformation of dependent variable 
transforms the PDE to 

Vxixi+Vx2X2-Vx3X3H^+'^Ci)Vx^ + {-U+2c2)Vx2 + {8+2c3)Ux3 + {cl+cl-cl+Qci-Uc2+8C3)v = 

Now choose ci = —3, C2 = 7, C3 = —4 to obtain 

VxiXl ~^ '^X2X2 "^X^Xz "I" 126^ = 

(b) Find a change of dependent variable that reduces the parabolic PDE 

"^xx + — 2ut + 8u = 

to the one-dimensional heat equation vt = kVxx- 
[Hint: Define v by u{t,x) = a;)e"^+^*.] 

□ 

From the above examples and exercises it is clear that any second order linear PDE with 
constant coefficients can, via a transformation of coordinates and/or change of dependent 
variable, be reduced to either the Laplace-, or the heat- or the wave equation, depending on 
whether it is elliptic, parabolic or hyperbolic. We have perhaps spent an inordinate amount 
of time on this, and will have still more to say about it in the next section. It is therefore 
important that you understand why we emphasize this classification of PDEs: Essentially, 
there are just three kinds of behaviour a solution to a second order linear PDE can exhibit. 
The solution can either imitate a system in equilibrium (if it is elliptic), or, for evolving 
systems, it can imitate something like heat diffusion (if it is parabolic) or something like wave 
motion (if it is hyperbolic). After all, a smooth change of coordinates shouldn't change the 
qualitative properties of the solution, and a soultion which behaves like diffusion in one set of 
coordinates will presumably behave like diffusion in another set of coordinates as well. The 
classification of 2nd order linear PDEs thus allows you to use your intuition about physical 
processes to obtain qualitative information about the nature of the solutions of these PDEs. 

13.4 Characteristics for 2nd— Order Linear PDEs 

When one seeks to reduce a second order linear PDE to canonical form, the notion of char- 
acteristic curves (or just characteristics) is often useful. The characteristics also have an 
interpretation as curves along which information can propagate, as we will endeavour to show 
later. 

Consider a 2nd order linear PDE with principal part in new coordinates 

^ = ^{x,y),ri = r]{x,y). We assume that the change is invertible, or equivalently, by the 
Implicit Function Theorem, that ^,xily ~ ^yVx / 0. Do the following exercise now: 



Introduction to PDEs 



257 



Exercise 13.4.1 (a) Recall the chain rule and show that 

Uxx = U^^il + '^'U^ri^xVx + UvvVl + H^xx + ""r^^x 

'^yy = u^^^l + '^Hv^y'ny + '^m'nl + H^yy + Urtr]yy 
(b) Hence show that the principal part of the PDE in transformed coordinates is 



where 
and 

(c) Then show that 
so that 



a = a^l + ^b^x^y + c^^ c = aril + 2br)xriy + crjl 
b = a^xVx + b{^xVy + ^yVx) + c^yVy 
b^ -ac= (6^ - ac){^xVy - ^y^xf 
A = J^^'AJ where J 



i.e J is the Jacobian matrix of the map r/) : — >■ M^. Thus that the type (elliptic, 
parabolic or hyperbolic) of a 2nd order linear PDE doesn't change under an invertible 
change of variables, linear or non-linear. 

□ 

The principal part will simplify to just a 'u^^-term when a = = c, i.e. when both ^,77 solve 
the equation 

{zx Zy)A(^'j^=0 W 

This is the characteristic equation of the PDE. The level curves z{x, y) = const, are known as 
the characteristics of the PDE. To determine its solutions, we use the following result, which 
reduces to an ODE: 

Theorem 13.4.2 z{x,y) = const, is a characteristic iff z{x,y) = const, is a solution of the 
equation 

a{dyf - 2h{dx) ■ (dy) + c{dx)'^ = (f) 

Proof: Suppose first that z{x,y) = 7 = const, is a characteristic curve, i.e. a solution of (f). 
If (t) is not vacuous, then either Zx{x,y) or Zy{x,y) is 7^ 0. Assume Zy{x,y) ^ 0, so that the 
equation z{x, y) = 7 defines y in terms of x, i.e. 

z{x,y) = -f ^y = f^{x) 

Then along the curve y = f'y{x) we have dz = 0, and so 

dy ^ Zx{x,y) 
dx Zy{x,y) 



258 



Introduction to PDEs 



Prom (f), upon dividing each term by Zy, it now follows that 

«(^)'- 26^+^ = 

with {*) as immediate consequence. 

A similar argument holds if = but Zx ^ 0- 

Conversely, suppose that z{x,y) = const, is a general solution of (f). We want to show 
that z(x,y) satisfies (★) at an arbitrary point. Given an arbitrary point {xo,yo), define 
7o = z{xo,yo), and consider the curve y = /^^(x). Along this curve, we have 

In particular, with x = xq (and thus y = /'y^ixo) = yo) we see that (★) holds at xq, yo). 

H 

Examples 13.4.3 1. The characteristics of the ID wave equation uu — oP'Uxx = are deter- 
mined as follows: The characteristic equation is z^ — c?z^ = 0, i.e. a = l,5 = 0,c = —o? 
and so the characteristic curves are the solutions of (dx)^ — o?{d£)^ = 0. Hence ^ = ±1, 
i.e. X = ±at + const.. So there are two families of characteristic curves, given by 
^ = X + at = const, and rj = x — at = const.. 

2. The characteristics of the ID heat equation ut — Uxx = are determined as follows: We 
have a = = b and c = —1, so the characteristic curves are solutions of {dt)'^ = 0, i.e. 
t = const.. There is just one family of characteristic curves, given by r/ = t = const.. 

3. The characteristic curves of the 2D Laplace equation Uxx + Uyy = are determined as 
follows: We have a = 1 = c and 6 = 0, so the characteristic curves are solutions of 
(dy)^ + {dx^ = which has no non-constant (real) solution. Hence Laplace's equation 
has no (real) characteristics. 

□ 

Examples 13.4.4 1. Problem: Transform the equation Uxx — x'^yuyy = 0, (y > 0) to a 
canonical form with principal part u^r]- 

Solution: We have — ac = x'^y > when y > so the equation is hyperbolic. Wc first 
determine the characteristics, i.e. the solutions of {dy)"^ — x'^y{dx)'^ = 0. This yields an 
ODE 

1 = ^^^ 

which is separable: ^ = ±x dx, and hence 2^^ = + const., which yields 

lb A^/y = const. 

We therefore define 

C = + r? = x^ - 4v^ 

Then, since A = J^^AJ, where J is the Jacobian, i.e. 



Introduction to PDEs 



259 



we obtain 

a = = c 6 = = 4(^ + r/) 

so that the principal part of tlic PDE in transformed coordinates is 8(^ + r])u^^. We leave 
it as an exercise that the PDE transforms to 

2. Problem: Transform the equation e^^Uxx + '^e.^^'^Uxy + e^^Uyy = to a canonical form 

without mixed partial derivatives. 

Solution: We have 5^ — ac = so that this PDE is parabolic. Wc therefore expect to be 
able to reduce it to something like the heat equation. Specifically, we seek coordinates rj 
so that the principal part of the PDE is u^^. 

First, we determine the characteristics, i.e. the solutions of 

e2^(dy)2 - 2e''+y{dx){dy) + e'^y{dx f = 
This yields {^)^ - 2e2'-^^ + 6^(2'-^) = 0, so that 

^ = e^^-^ i.e. e"^ dx - e'^ dy = 
dx 

and hence the characteristics are the curves — = const.. We seek coordinates 

^,r] so that the PDE will have principal part m^^, i.e. we want b = = c. Now c = 
arjx + 2betaxT]y + cijy, and thus to ensure that c = we see that rj must satisfy the 
characteristic equation. As the PDE remains parabolic under the transformation, we then 
will also have r — ac = 0, which immediately implies that 6 = as well, as desired. 
We thus define to be a solution of the characteristic equation, i.e. rj = e~x — e~^. 
As for ^, it can be anything (subject to the constraint that ^,x% ^ CyVx 0- We chose 
^ = X. Then d = a^^ + 2b£,x^y + c^y = e^^'. Some calculations show that the PDe becomes 
e^^u^^ + 2ujj = 0, and thus that 

u^^ + 2e~^^Ur^ = 

□ 



Remarks 13.4.5 1. The characteristic curves of a second-order PDE awa;^; + 26„a;j^ + ciij^jy + 
dux + euy + fu = g can also be interpreted as exceptional curves: The PDE together with 
the values u,Ux,Uy along the curve do not uniquely determine the values of Uxx,Uxy,Uyy. 
Let us see what is meant by this: Take a smooth curve C in R^, parametrized by a 
coordinate s, i.e. there is an interval / and maps x,y : I ^ M so that C is the set of 
all points {x{s),y{s)) for s E I. Suppose that known along C, i.e. there are 

functions F, G, : / — M so that 

u{x{s),y{s)) = F{s) Ux{x{s),y{s)) = G{s) Uy{x{s),y{s)) = H{s) 

Then we obtain three equations in the three unknowns Uxxj Uxy, Uyy at the point x{s),y{s)). 
The first is the PDE, and the other two are obtained by differentiating G = Ux,H = Uy 



260 



Introduction to PDEs 



w.r.t. s: 

auxx + 'i.buxy + cuyy = g - dF{s) - eG{s) - fF{s) 

UxxX {S) + UxyV (S) = — 

Uxyx'is) + Uyyy'{s) = H'{s) 

where a,b, . . . ,g are functions of s: a = a{x{s), y{s)), etc. 
To solve it we look at the coefficient matrix 

fa 2b c \ 
A := x'{s) y'{s) 

V x'{s) y'{s)J 

We will not be able to solve this precisely when det{A) = 0, i.e. when 

which implies the defining equation of the characteristics 

a{dyf - 2b{dx){dy) + c{dxf = 

Thus the characteristics are precisely those curves along which the second derivatives are 
indeterminate (given the PDE and lower order derivatives) . 

2. Related to the above is the following fact: the characteristics are the only curves along 
which discontinuities in the solution u and its derivatives can occur. We won't prove this 
here, but it may be seen intuitively as follows: If u,Ux,Uy are continuous in a region, 
then Uxx,Uxy,Uyy, being determined by u,Ux,Uy along any smooth curve in that region 
— except the characteristics! — will be continuous also. So if any of the second-order 
derivatives is discontinuous, it must be along a characteristic curve. 

This has important consequences for the nature of a solution of a PDE: 

• Laplace's equation has no characteristic curves; hence solutions must be smooth. 

• For the heat equation has only one set of characteristic curves, namely curves t = 
constant. So discontinuities can only ocur along curves where t is constant. Since 

we are interested in the time evolution of the solution to the heat equation, wc never 
hold t constant. Hence if discontinuities occur at t = 0, they cannot spread into the 
region i > 0. So even if the initial temperature has discontinuities, the temeperature 
will be smooth at any later time. 

• The wave equation uu — o?Uxx has two sets of characteristics x it at, which extend 
from the intial time t = into the region t > 0. hence singularities can propagate, 
along these characteristic curves. It is for hyperbolic PDEs that characteristic curves 
are most important, and we will therefore make scant use of them from here on. 

□ 

For n-dimensional second order linear PDEs, the characteristic surfaces are the level 
surfaces 

z{x\, . . . , Xn) = const. 



Introduction to PDEs 



261 



of solutions of the characteristic equation 




This cannot usually be reduced to an ODE if n > 2, and other techniques need to be used. 
However, when the PDE has constant coefficients, we can always use a orthogonal change 
of variables — using an orthonormal basis of eigenvectors — to remove the mixed partial 
derivatives. 

Example 13.4.6 Consider the PDE 



The eigenvalues are given by the equation det{A — XI) = 0, and are found to be A = 1,3,4. 
All the eigenvalues are strictly positive, so this PDE is elliptic. Corresponding orthonormal 
eigenvectors are easily determined to be -^{^, 2, l)**^ , 0; '^i^^ -'-)*' • Thus we 

use the transformation ^ = 0*''x, where O is the orthogonal matrix whose columns are the 
orthonormal eigenvectors of A, i.e. 



^"^xy ~1~ 2w^ 



yy 



2uyz + Suzz + 12^2^ — 8uz = 



The principal part has matrix 





to obtain a PDE with principal part 



■XX 



□ 



262 Introduction to PDEs 



Chapter 14 

Laplace's Equation 



14.1 The Divergence Theorem and Related Results 

This section serves to remind you of one the basic tools in PDE theory: The Divergence (or 
Gauss-Green) Theorem. Let U C R™ be a bounded open set, with a smooth boundary dU, 
and let n denote the outward pointing unit normal vector to the boundary dU. (n is actually 
a vector field, i.e. a map n : dU — ^ W^: We have an outward pointing normal vector at 
every point x G dU.) For any u G C^{U), we denote the directional derivative in the direction 
ofnby 1^: 

du ,^ / N 
— = Du{y.) ■ nx 
an X 

Lemma 14.1.1 If U is a ball centered at the origin, and r is the radial component of (n- 
dimensional) polar coordinates then 

d__ d_ 
dn dr 

Proof: For the unit normal at point x is n = = ^. where r = {Y^^=i ^1)^- Note also that 

= because each coordinate Xk has polar coordinates of the form Xk = rY\jtrigj{9j), 
where each function trig^- is either sin or cos, and 9j are angles. Hence 



du 
dn 



du dxk du 



dxu dr dr 
fe=i fe=i 



All forms of the Divergence Theorem follow from the following basic tool: 
Proposition 14.1.2 Suppose that u G C^{U). Then 



L 



U JdU 



Ux, dx= (f uHi dS 



(where Ux^ ■= §^ and n = (ni, . . . ,nm) is the outward pointing unit normal.) 



263 



264 



Laplace's Equation 



Proof: We give an outline of the proof in three dimensions, with Xi = z-coordinate. The 
same ideas work in higher dimensions (but note that the proof makes heavy use of spatial 
intuition, and that a rigorous proof is actually quite difficult, even in three dimensions.) A 
simple region is a region which is cylindrical in the direction of one of the coordinate axes, with 
smooth disjoint top and bottom surfaces. A sufficiently regular bounded open set U can be 
decomposed into many "cubes" Ci, which will be simple. The volume integrals dV over 

each component cube will add up to the volume integral over the whole of U: Jc ~ 
JjjUz dx. If two cubes are adjacent, i.e. share a common surface, then the surface integrals 
over that common surface will cancel each other out, as the outward unit normal of the one 
will be minus the outward unit normal of the other. Thus the surface integrals of all the cubes 
will add up to the surface integral over dU : ^q^^ uUz dS = §qjj uUz dS. It follows that if 
the divergence theorem holds for each of these simple cubes, i.e. if j'^j Uz dx = uUz dS 
for all i, then it will hold for the whole region U : 

Ju i •'Ci ~' JdCi JdU 

Consider now a simple "cube" C, cylindrical in the direction of the z-axis, with smooth 
top and bottom. Let the top surface be given by z = t{x,y) and the bottom surface by 
z = b{x,y), where b,t are C^-functions. Also let A be the projection of the cube onto the 
xy-plane, so that ^4 is a rectangle in the xy-plane, with sides parallel to the axes. Thus 

C = {{x, y, z) eR^ : {x, y) e A and b{x, y) < z < t{x, y)} 

We first calculate §QQUn^ dS. On the lateral sides, the z-component of the outward 
unit normal is zero, so the surface integrals over the lateral surfaces do not contribute to 
UUz dS. We thus have 



UUz dS = UUz dS + UUz dS 

dC J top J bottom 

Let's look at the integral over the top surface: If 6 is the (acute) angle between the outward 
unit normal of the top surface and the z- axis, then dS = = Moreover, Uz = cosO. 

Hence j[^^u{x,y, z)nz dS = J^u{x,y,t{x,y)); dx dy. 

Now consider the integral over the bottom surface: If (p is the (obtuse) angle between the 
outward unit normal of the bottom surface and the z-axis, then cosi^ < 0. It follows that 

= "2 = cos if, so that /bottom '"(^' 2/' ^)"-^ = - !A'^i^^yMx,y));dx dy. It 

follows that 

/ unzdS= / u{x,y,t{x,y)) -u{x,y,b{x,y)) dx dy 
JdC J A 

But 

[ Uzd= [ ( I Uzdz] dxdy= I u{x,y,t{x,y)) - u{x,y,b{x,y)) dx dy 
Jc J A \Jb{x,y) ) J A 

by the Fundamental Theorem of Calculus, because J^* ^u(x^ y, z) dz = u{x, y, t) — u{x, y, b). 
Hence Uz dx = uHz dS, as required, and the result follows. 

H 



Laplace's Equation 



265 



We have the following corollaries: 

Theorem 14.1.3 (Integration-by-parts formula) Suppose that u,v G C^{U), where U C 
is a bounded open set with smooth boundary. Then 



/ UxiV dx = — / uvxi c/x + ® uv Ui dS 
Ju Ju JdU 



lu Ju JdU 

□ 



Theorem 14.1.4 (Divergence Theorem: vector form) // u : M"* — is continuously 
differentiable, where U C M"* is a bounded open set with smooth boundary. Then 



/ div u cZx = ® u- n dS 

Ju JdU 



□ 



Theorem 14.1.5 (Green's Formulas) Suppose thatu,v G C^{U), where U C is a bounded 
open set with smooth boundary. Then 

I. J^AudV = §,^^dS 

II. Jjj Du ■ Dv dx = — uAv dx. + dS 
III. uAv - vAu dx = f,^ ul^ - dS 



□ 



Exercise 14.1.6 Prove the preceding theorems. 

[Hints: For the integration by parts formula, apply Proposition 14.1.2 to the function uv. 
For the vector form of the divergence theorem, apply Proposition 14.1.2 to each of the com- 
ponents of the vector-valued function u and sum. 

For Green I, use the integration-by-parts formula with in place of u and v = 1. 
For Green II, use integration-by-parts with v^,. in place of v. 
For Green III, use Green II.] 



□ 



14.2 Harmonic Functions 

In this section, we study harmonic functions. These are exactly the solutions to Laplace's 
equation, i.e. a C^-function ■u : M"^ — > M is said to be harmonic iff 



266 



Laplace's Equation 



14.2.1 Some Heuristic Remarks about the Laplace Operator 

The Laplace operator in is 

1=1 ' 

In one-dimension, to say that Au(x) > implies that if and only if u is concave up at x. 
This implies that if we take two neighbouring points x — a,x + a, we have 

u{x) < ^u{x — a) + ^u{x + a) 

i.e. the value u{x) is smaller than the average of its neighbours. In higher dimensions, 
A'u(x) > docs not imply that the function is concave up: Consider, for example, u{x, y) := 
x^ — which is concave up along the .x-axis (where y = 0) and concave down along the y-axis 
(where x = 0). (Indeed, we know that the correct generalization of the second derivative test 

is that the Hessian matrix be positive definite. Here, the Hessian is ( ^ ] which is not 



.0 -2^ 

positive definite.) However, the averaging property does hold in higher dimensions: 

// An > 0, then the value of u at x is smaller than the average the values of its neighbours. 

This is an imprecise statement, whose purpose is only to provide intuition. We shall formalize 
it in the next section, when we discuss the mean value property. For the moment, however, 
let's give an intuitive argument in two dimensions — the argument can easily be generalized 
to higher dimensions. Note that a second-order Taylor expansion yields 

^(x + Ax) = u(x) + Z)'u(x) • Ax + ^ Ax*''ii"u(x)Ax + o(| | Ax| f) 

(IX U \ \ 

) 1^ is the Hessian matrix of u evaluated at the point x. Fix /t G M, 
Uyx UyyJ ^ 

substitute the following four values of Ax into the Taylor expansion 

Ax=(/i,/i) Ax=(/i,-/i) Ax = (-/i,0) Ax=(-/i,-/i) 

and then add to obtain 

u{x + h,y + h) + u{x + h,y — h) + u{x — h,y + h) + u{x — h,y — h) 

4 

=u{x, y) + -{uxx + Uyy)h^ + o{h^) > u{x, y) 

We thus see that u{x,y) is less than the average of values at the neighbouring points {x it 
h,y±h). 

As stated before, we will make this precise in the next section. For the moment, note that 
the three canonical examples of second order linear PDEs are 

= Au Laplace's equation 
Ut = Au heat equation 
Utt = Au wave equation 

We can now make some qualitative remarks about the solutions of these PDEs: 



Laplace's Equation 



267 



• If n satisfies Laplace's equation, it means that the value of n at a point x equals the 
average of the values of nearby points. 

• If u satisfies the heat equation, then > when Au > 0. This means that the temper- 
ature at a point x will increase {ut > 0) when it is less than the average temperature 
at neighbouring points. 

• li u satsifies the wave equation, then uu > when Au > 0. This means that a point 
at position x on a vibrating string will accelerate upwards when its displacement is less 
than the average displacement at neighbouring points. 



14.2.2 Volumes and Surface Areas of Balls in MP' 

We will leave the proof of the following theorem as an exercise, as we shall only apply it in 
the case n < 3, where the result is presumably well-known: 

Theorem 14.2.1 InW", the "volume" Vn{R) and "surface area" An{R) of a ball with radius 
R is given by 



AniR) 



r(n/2) 



i?" if n is even 



2n(27r)("-^)/2 n 
R^ ^ ifnis odd 



1 -S-S--- -n 

(where T{x) := /o°°e~*^^~^ dt is the Euler Gamma function). 

□ 

Exercise 14.2.2 We prove the above result. 

(a) First, we show that /]g„ e^H^'ll^ c?x = 7r§. To prove this, note that if C := e~^^ dx, 
then 



C^ = f\( r e-^' dxj) = [ e-ll-ll' dx 
\J-oo J Jr" 

For n = 2, use polar coordinates to deduce that 

C^= / e-'''\ dr d9 = 7r 
Jo Jo 



so that C = TT 2 . 



(b) Converting to n-dimensional polar coordinates, note that 

/ e-ll^ll' dx = An{l) r e-''V"-^ dr 
Jw." Jo 

where An{r) denotes the surface area of an n-dimensional ball of radius r. This formula 
will also hold for n = 1, provided we define ^i(l) =2. 

(c) Conclude that 

Trt = lA^{l)m) 



268 



Laplace's Equation 



(d) Show that the Gamma function F has the following properties: 

(i) r{x + i) = xr{x). 

(ii) r(0) = 1. Hence r(n) = (n - 1)! for n G N. 

(iii) r(i) = Tri Hence r(n + 1) = vri(n - i)(n -§)... (1). 

[Hint: For (i), use integration by parts. For (iii), use (c) and the fact that ^i(l) = 2] 

(e) Now verify that the formulas for areas An{r) given in Thm. 14.2.1 are correct. 

(f) Finally, show that 

VniR)= r Aniiy'^-Ur = An{R)- 
Jo 1^ 

□ 



14.2.3 Mean— Value Property and the Mctximum Principle 

Definition 14.2.3 A function u defined on an open set i7 C is said to satisfy the mean- 
value property sA xq E U if 



/9S{xo,R) 

for every ball -B(xo, R) of radius R > centered at xq which is contained in U. 

□ 

The integral in the definition above is a surface integral over the surface (i.e. boundary) of 
the ball B{xo,R). 

Theorem 14.2.4 (Mean Value Property of Harmonic Functions) Let U C be an open 
set. 

(a) If u ^ C'^{U) n C^{U) is harmonic in U , then u satisfies the mean value property at each 
xo G U. 

(h) Conversely, if function u G C'^{U) fl C^{U) satisfies the mean value property, then it is 
harmonic. 

Proof: (a) For r > 0, define 



Then 



<^'(^) = t4tT / ^"(^0 + rz) • z dSiz) = / Dui^o) ■ dS(x) 

^n(l) JdB{0,l) ^n(r) 7aB(xo,r) ^ 



Laplace's Equation 



269 



Now n := ^ is precisely the outward unit normal to dB{x.Q,r) at x, so by the divergence 
theorem (noting that div • Du = Aii) we get 



Since u is harmonic on U, Au = on S(xo,r) C U, and so ^'(r) = 0. It follows that cf) is 
constant. In particular 



A. 



® u(x) (i5(x) = (/)(r) = lim i5!)(s) = lim ® it(x) (iS'(x) = tt(xo) 



which proves (a). 

(b) Suppose that it has the mean value property, but is not harmonic. Then there is an open 
set V C.U where /S.u > 0, or else one where Au < 0. Without loss of generality, assume the 
former holds, i.e. that > on F C ^7. Let xq G V, and define 0(r) as in (a). Note that 
we obtained there the following equation: 



An{r) 7s(xo,r) 



S(xo,r) 

It follows that </)'(r) > 0. On the other hand, the mean value property at xq is precisely the 
statement that 0(r) is a constant function with value ■u(xo) — contradiction. 

H 

As an easy corollary of (a), we prove a "solid" version of the mean value property: 
Corollary 14.2.5 

// a twice continuously differentiable function u is harmonic in an open set U C M", then 



^(xo) = f u{yi) c?x 

Vn{r) 7i?(xo,r) 



for every r > such that B{xQ,r) C U. 
Proof: Clearly 



^ / u(x) dx = r I i u(x) d5(x) I 

n{r) Jo ^"^'^ 



ds 



= ^ / AM) ds 
= «(xo) 

H 

An intuitively plausible consequence of the mean value property is that harmonic functions 
are very smooth: If u is harmonic, then the value u{x) of n at a; is the average of the u{y) 
for neighbouring points y, and this averaging ensures smoothness. We won't show it here, 
but it can be proved that harmonic functions are analytic, i.e. C°°-functions (infinitely 



270 



Laplace's Equation 



differentiable) which are everywhere equal to their Taylor series expansions (in thir domain 
of convergence) . 

Another important consequence of the mean value property is the following, the strong 
maximum principle: The maximum of a harmonic function on a bounded open set occurs on 
the boundary. Furthermore if an interior point is a maximum, then u is constant: 

Theorem 14.2.6 (a) Let U be a bounded open set, and suppose that u is harmonic in U and 
continuous on U. Then the maximum of 

maxtt(x) = max u(x) 
xec7 ^edu 

i.e. the maximum occurs on the boundary. 

(b) Furthermore, if U is connected and there is an interior point xq G U such that 

'u(xo) = maxu(x) 

then u is constant on U. 

Proof: We first prove (b): If the maximum of on C7 occurs at some interior point xq G U, 
then since 

<^0) = T7-7-Z / U{X) dx 

we see that we must have u(x) = ■u(xo) for every x G -B(xo, r). Now given an arbitray point 

y € U we can connect xq to y by a finite sequence of overlapping open balls in U (i.e. balls 
Bo,Bi, . . . Bm C [/ so that xq is the center of i?0) y the center of Bm and Bi-i n 7^ for 
i = 1, . . . ,m.) Then u is constant on each ball, and hence u{xq) = u{y). Since y & U was 
arbitrary, u is constant on U. This proves (b). 

(a) follows straightaway: If u is constant, then of course the maximum occurs on the boundary 
(and everywhere else also). If u is not constant, then the maximum cannot occur at an interior 
point, by (b), so must occur at a boundary point. 

H 

Now when u is harmonic, so is —u. Applying the maximum principle to —u leads to an 
analogous "minimum principle" for u: The minimum of a harmonic function on a bounded 
open set occurs on the boundary. Moreover, if it also occurs in the interior, then u is constant. 

Exercise 14.2.7 Interpret and verify the results of this section for one-dimensional harmonic 
functions, i.e. functions u{x) having u" = 0. 

□ 

Exercise 14.2.8 Laplace's equation is invariant under translations and rotations: If u : 
^ M has Au = 0, and if 



f(x) := u{x + c) w;(x) := tt(Ox) 



Laplace's Equation 



271 



where c G R"^, and O is an orthogonal n x n-matrix, then A?; = and Aw = 0. 

[Recall from linear algebra that a matrix is orthogonal iff = O**". The orthogonal matrices 

are precisely the linear transformations which are rotations: For if x G M", then 

||Ox||2 = (Ox)*'' • (Ox) = x*^0*''Ox = x*''x = ||x||2 

i.e. ||Ox|| = ||x||.] 

□ 

Exercise 14.2.9 Here is a proof of the m,axim,um, principle. It is weaker than the proof based 
on the mean value property, as it doesn't show the absence of maxima at interior points, but 
only that there are boundary maxima. Nevertheless, it is quite intuitive, appealing only to 
the second derivative test for maxima, and will improve understanding. 

Let u G C'^(U) n C^(U) be subharmonic on U, where U C is a bounded open set, i.e. 
assume that Au > 0. Since U is bounded, there is R such that U C B{0, R). 

(a) Suppose that u has a maximum at an interior point xq G U. Explain why Ua;.a;.(xo) < 
for alH = 1, . . . , n. 

(b) Conclude that Au<0. 

(c) At most maxima, we have UxiXi < 0- If this is the case, we have Au < 0, contradicting 
the fact that u is subharmonic. But it is possible that An = at a minimum. Give an 
example of such a u. 

(d) To get around this, define 
for £ > 0. Show that 

where n is the dimension of the space. 

(e) Conclude that v has no maximum in the interior U. 

(f) Explain why v must have a maximum on U, and conclude that v has a maximum at some 
point xo G dU on the boundary. 

(g) Conclude that if x G is in the interior, then 

■u(x) < v{'k) < u(xo) < n(xo) + e||xo|p < sup u{y) + eR? 

yedu 

(h) Explain why we may conclude that for all interior points x G f/ we have 

'u(x) < sup u{y) 
y&dU 

(i) Also explain why there is xm € dU such that supygg^/ u(y) = n(xM) • 
(j) Conclude that a maximum of u occurs at a boundary point. 



v{x.) := u(x) + £:||x| 



Av = Au + 2ne > 



272 



Laplace's Equation 



□ 

RertiEirks 14.2.10 Here are just a few comments for those who know some complex analysis. 
A holomorphic (i.e. differentiable) function / : C — C may be regarded as a pair of real- 
valued functions u,v : R"^ R: 

f{z) = f{x + iy) = u{x, y) + iv{x, y) 

But if a function / is holomorphic, then the Cauchy-Riemann equations must hold: 

Ux = Vy and Uy = —Vx 

(Recall that these are obtained by computing f'{z) in two ways: On the one hand /'(z) = 
limAx-^o = Ux{x, y) + ivx{x, y). But on the other hand, we have f'{z) = 

limAy^o = -iuy{x,y)+Vy{x,y). Hence u^+ivx = f = Vy-iuy. Equating 

real an imaginary parts, one obtains the Cauchy-Riemann equations.) Differentiating the first 
of the Cauchy-Riemann equations with respect to x, and the second with respect to y, we 
see that u^x = Vyx and Uyy = —v^y As v^y = Vyx we see that 

"^xx ~t~ "^yy ~ ^ 

In the same manner, differentiating the first equation w.r.t. y and the second w.r.t. x one 
derives, 

VxX ~t~ Vyy — 

Hence the real and imaginary parts of a holomorphic function are harmonic in M^. 

In complex analysis, the maximum modulus theorem is the analogue of the maximum 
principle. The mean value property, which states that a harmonic function is completely 
determined by its values on the boundary of a curve, is clearly related to the uniqeness of 
analytic continuation. It is also well-known that holomorphic functions are analytic, i.e. C°° 
and everywhere equal to their Taylor series expansion, and a similar result can be proved for 
n-dimensional harmonic functions. 

□ 

14.3 Solving Laplace's Equation 
14.3.1 Uniqueness of Solutions 

The canonical example of an elliptic equation is Laplace's equation 

Au = 

and its non-homogeneous version, Poisson's equation 

-Au = f 

The solutions of Laplace's equation are precisely the harmonic functions. Problems which 
are well-posed for Poisson's or Laplaces equation will usually be well-posed for more general 
elliptic problems as well. The maximum principle allows us to immediately prove a partial 
well-posedness result for the inhomogeneous Dirichlet problem for the Poisson equation: 



Laplace's Equation 



273 



Theorem 14.3.1 Let C be a bounded open set. The Dirichlet problem 

Au = f in U 
u = g on dU 

has at most one solution. 

Proof: If ui, U2 are two solutions to the above Dirichlet problem, then u := ui — U2 satisfies 
the corresponding homogeneous problem 

A-u = inU 
u = Q on dU 

Thus u is harmonic. Since its maximum and minimum must occur on the boundary, where u 
takes the value 0, we see that < u(x) < for all x G U. It follows that u\ = U2, i.e. that 
the solution, if it exists, is unique. 

H 

Remarks 14.3.2 Note the contrast with the wave equation: Consider a vibrating string 
of length 1, fixed at its endpoints. Let u{t,x) be the displacement at time t at point x, 
where < x < 1, and suppose that we seek a solution with the property that the initial 
displacement is zero, i.e. ^(0, x) = for all x, and that the displacement at time T = 1 is 
zero, i.e. u{l,x) = for all x. Here, we are considering U = (0, 1) x (0, 1) C M^, and the 
Dirichlet problem to solve is 

utt — Uxx = \nU 
?/ = on dU 



For each ri G N, the function 



is clearly a solution, as 



Un{t, x) = sin (nTTx) sin {mrt) 



9 u 2 2 

-;— 77 = —n IT u = — — ^ and smnvr = = smO 

Thus comparing the two seemingly similar Dirichlet problems 

Utt + Uxx = in J7 (Laplace) uu — Uxx = in U (Wave) 

M = on dU « = on dU 

we see that Laplace's equation has a unique solution (namely = 0), whereas the wave 
equation has many. 

As the uniqueness of the solution to the Laplace equation follows from the maximum 
principle, there can be no analogous maximum principle for the wave equation. 

□ 



274 



Laplace's Equation 



14.3.2 Fundamental Solution of Laplace's Equation 

Exercise 14.2.8 shows that Laplace's equation is invariant under rotations. This suggests that 
we seek a radial solution to Laplace's equation, i.e. a solution 

n(x) = v{r) r = {x{ + xl-\ h x^) ^ 

which is a function solely of r. Note that 

— - — r 7^ 

dxi r 

and so for i = 1 , . . . , n we have 
It follows that 

Au = v"{r) + "^^i;'(r) 
r 

and so 

Au = ^ v"+ "^v' = 

r 

We thus have a second order ODE to solve. The solution depends on the dimension n: 
Assuming v is not constant, i.e. v' ^ 0, we get 

ln{\v I) = ^ = — 
from which In \v'\ = (1 — ra) Inr + C, i.e. 

v'{r) ^ 

Integrating again, we see that 



j.n—1 



Blnr + C ifn = 2 
^ ^ + C if ra > 3 



where B,C arc constant. 

With the above in mind, we now define 

Definition 14.3.3 The function 

1 

*(x) := { , 



In ||x|| if n = 2 
27r " " 



1 



L (n-2)A„(l) ||x||"-2 



if n > 3 



defined for x G M" — {0} is called the fundamental solution of Laplace's equation. Here, An{l) 
is the surface area of a unit ball in R". 

We may on occasion write ^'(||x||) or ^(r) for ^'(x), to highlight the fact that the fundamental 
solution ^ is radially symmetric. 



Laplace's Equation 



275 



□ 

Note that * blows up near the origin, and is undefined at the origin. Hence * is harmonic 
in any region that does not contain the origin. 

The reason for the choice of the constants — tt- and -, — J; ^ are made clear by the next 

27r (n-2)A„(l) 

lemma, which will prove useful a number of times: 

Lemma 14.3.4 // B is an ball of radius R in M", centered at xq, then the outward normal 
derivative satisfies 

Proof: Let r = ||x — xo||. For n > 3, we have *(x — xq) = („_2j'A (i) for n = 2 we 

have *(x — xq) = — ^ Inr. The outward unit normal derivative on dB is given by 

dn dr 

by the same argument as given in the proof of Lemma 14.1.1. But clearly 

1 1 1 



dr 



T.N 1 ^ 



r=R ^' (n - 2)^„(1) 5r r=Rr"-2 An{l)R^-^ An{R) 
The same argument holds when n = 2. 



We can represent all harmonic functions as surface integrals involving the fundamental solu- 
tion: 

Theorem 14.3.5 Suppose that u is defined on a hounded open set U C M"-. Then 

■u(xo) = - / ^'(x - xo)Ait(x) dx - / ti(x)^*(x-xo) dS'+ </ *(x-xo)^dS' 
Ju JdU C"- JdU (Jf^ 

for all Xq G 

Proof: Fix xq G M", and define w(x) := ^'(x — xq). Note that Au = 0. We start with Green's 
formula III, which states that 

[ ulS.v — vAu d:x.= (f — dS (*) 

Ju JdU on dn 

But Green's formula (*) holds only when u, v are everywhere defined in U, which is not the 
case here, as w(x) = ^(xq — x) is undefined at x = xq. 

We therefore isolate the singularity xq: Let be a closed ball of radius e about xq, 
contained entirely within U, and let Us := U — B^. then Ue is a bounded open set which does 
not contain xq, and so u^v are everywhere defined on Ue- Applying Green III, we see that 



/ wAu c/x = ® 

JUe Jdl 



— I vAu c/x = rf) u- V— dS 

Iqu^ dn dn 



276 



Laplace's Equation 



Now dUe = dU U dB^ is the disjoint union of the boundary of U and B^, and so §q^^ = 
§dU + §dB,- It follows that 

f . , I dv du f dv du ,^ 

— / v/\u ax — (t u— V— ab = (t u— v— ab 

Jue JdU on dn Jgs^ dn dn 

We will be done once we show that the righthand side of the above equation converges to 
u(xo) as e ^ 0. 



Consider, therefore §j. 



, dv „, du 



Vt^ dS. On dBp the outward unit normal derivative 



d_ 

dn 



is just — where r := ||x — xo||, as "outward" from Us means "towards the xq". It follows 
that 

/ dv du ,^ f dv f du ,^ 

(p u- V— db = - (h u— + (h V— dS 

JdB, dn dn Jqb^ dr Jqb^ dr 

Let's evaluate the first term of this integral. We consider the case n > 3, and leave the 
n = 2-case as an exercise. For n > 3, we have ^ = — ^ \i) -^^^^ r = e on dB^^ so 

— i) u^ dS = , , , (j) udS = , ^, , i) udS ^ u{xn) as e — 

JaB, dr JaB, M^) JdB, 



For the remaining term, let M =: max | 



9m{x) 
dr 



: ||x - xo| 



on the (compact) surface dB^. Then we have 



dBe 



du 



dr 



V dS 



1 



and hence 



{n - 2)^„(l)£"-2 Jqb 
du 



du 



dr 



dS < 



1 



/ 

J dBe 



dr 



(n-2)^„(l)£"-2 
v{x) dS ^0 as £ ^ 



be the maximum value of 



M 

MAnie) = -e 



n-2 



We thus see that 



d) u- v— dS — > ■u(xo) + = ufxo) as e — 

Jase dn dn 



We deduce immediately that 



f f dv du 

f vAu c/x — ® u- V— dS = u(xo) 

Ju JdU dn dn 



and we are done. 



Exercise 14.3.6 Finish the proof of the preceding theorem by dealing with the n = 2-case, 
i.e. show that when u is defined on a bounded open set ?7 C M^, then 



u(xo) = — j In ||xo — x|| Au(x) dx + — y m(x) In ||xo — x| 



du 
dn 



In llxo — xll ds 



□ 



Laplace's Equation 



277 



When u is harmonic, i.e. when Au = 0, the above representation simpHfies: 

CoroUsiry 14.3.7 Suppose that u is harmonic on a bounded open set U C M". Then 

f d du 

u(xo) = j)^ -tx(x) — ^'(x-xo) + *(x-xo)^ dS 

for all xq G 

□ 

It follows then that value of a harmonic function n at a point xq is completely determined by 
its behaviour "far away" , i.e. by its behaviour on the boundary of any region that contains the 
point Xq. When compared with the mean value property, this result will seem less surprising. 

Exercise 14.3.8 Suppose that (p : R" ^ M is a C^-function with compact support (i.e. a 
twice continuously different aible function that vanishes outside some compact set K. Show 
that 

(f)(0) = - j *(x)A(^(x) dx 

where ^ is the fundamental solution of Laplace's equation. 

[Hint: Choose a bounded open set U which is big enough so that 0, |^ vanish on dU and 
outside U. 

□ 

14.3.3 Green's Functions 

Consider now the following Dirichlet problem: 

-Ati = / in ?7 
u = g on dU 

where [/ is a bounded open set with smooth (i.e. C^) boundary. Fix x G C/, and choose £ < 
so that the closed ball i?e := -B(x, e) C [/, and let = U — B^. 
In order to use the representation in Theorem 14.3.5, i.e. 

f f du 

u(x) = - *(y - x) Ati(y) dy + j)^^ *(y - x) ^ - u(y) — *(y - x) dS{y) (*) 

we need to know Am in U — which we do — and also the normal derivative |^ on dU — 
which we don't. To get around this we seek, for this x, a corrector function = h^{y ) which 
solves an auxiliary boundary value problem: 

Ah"" = in ?7 

/l^(y) = ^(y - x) on dU 

Assuming that such a function can be found, we see that, by Green's formula, 

/■ /■ dh^ 8ij 

- J^h^{y)Au{y) dy = j^^u{y) — {y) - h''{y)g:^{y) dS{y) 



278 



Laplace's Equation 



which impHes that 



IdU 

Adding (*), (f), we obtain 

«(x) = -J^{^iy - x) - h^y))Au{y) dy - j^^u{y)^{^{y - x) - h^{y)) dS{y) 
We therefore define the Green's function G(x, y) for the region U by 

G(x,y) ■.= ^{y-^)-h^{y) 

and obtain 

n(x) = -/ G(x,y))Au(y)dy- / u(y)^G(x, y) d5(y) 

Ju JdU on 

Noting that we require = f m U and u = g oiv dU, we see that we have proved the 
following theorem: 

Theorem 14.3.9 (Green's function solution of Laplace's equation) 
Suppose that u G C'^{U) is a solution of the inhomogeneous Dirichlet problem 

—Au = f inU 
u = g on dU 

where U is a bounded open set with boundary. The Green's function for U is defined to be 

G(x,y) = *(y-x)-/i-(y) 

where, for x G [/, the function h^{y) is a solution to the BVP 

Ah"" = inU 
^^(y) = *(y - x) on dU 

The solution u{x) then has representation 

u(x) = ^/(y)G(x,y) dy - jf^5(y)|^G(x,y) dS{y) 
(where the normal derivative ^ is understood to be with respect to the y -variable.) 

□ 

We have therefore — in principle — solved the inhomogeneous Laplace equation: We just 
need to compute the Green's function for the appropriate region. However, this is usually 
a non-trivial task. We will give some examples shortly. For the moment, we want to get a 
better understanding of the meaning of the Green's function. 



Laplace's Equation 



279 



14.3.4 Properties of Green's functions 

Proposition 14.3.10 The Green's function G(xo,x) for a bounded open set U is the unique 
C^{U) -function such that, fixing xq, and regarding G^°(x) := G(xo,x) as a function of x: 

(i) AG = inU (i.e. G(xo,x) is harmonic in U ), except at the point x = xq. 

(a) G(xo,x) — ^'(x — Xq) is finite at xq, and is harmonic. 

(Hi) G(xo, x) = /or X e dU. 

□ 

Exercise 14.3.11 We prove Propn. 14.3.10 

(a) First prove uniqueness, i.e. that if Gi,G2 are two functions satsisfying (i)-(iii), then 
Gi = G2. 

(b) It therefore remains to show that the Green's function for a region satisfies (i)-(iii). Do 
so. 

[ Recall that G(xo,x) := ^'(x — xq) — /i^°(x) where the corrector function h^° is the 
solution of the auxiliary BVP 

Ah^° = inU 
/i^o(x) = *(x-xo) ondU 

(You may assume that this BVP has a solution; its existence follows from existence theory 
for the Dirichlet problem.)] 

□ 

Remarks 14.3.12 It is clear from the preceding exercise that the corrector function /i(xo, x) := 
h^"(x.) is the unique harmonic function which subtracts (or adds) just the right amount to the 
fundamental solution \I'(xo,x) := \I'(x — xq), for each xq, to ensure that G ■.= '^ — h satisfies 
the correct boundary conditions. 

□ 

Proposition 14.3.13 The Green's function G(x,y) for a hounded open set U is symmetric 
in X and y, i. e. 

G(x,y) = G(y,x) 

Proof: Fix xq, yo € U. We will show that G(xo, yo) = G(xo, yo). If xq = yo, there s nothing 
to prove, so assume that xq 7^ yo, and define 

m(x) := G(xo,x) w(x) := G(yo,x) 

u is singular at xq and v at yo, so we isolate the singularities in small closed balls := 
B{-kq,£), := B{yQ,e), where e > is chosen sufficiently small to ensure that the balls are 



280 



Laplace's Equation 



disjoint and contained in U. Now define Us '■= U — {A^ U Sg). Note that u, v are harmonic in 
Uc and that u{x.) = = v{x.) for x G dU. By Green's Theorem 



dv du 

I ul\v — vl\u dx = * n 

JUs JdU 

where 



/ uAv — vAu dx= (h — dS + Ie + Je 

Jue JdU on dn 



Jd 



dv du ^ f du dv ,^ 

u- V— dS J^ = - (b V- u— dS 

dAe o'^ "ri Jqb^ on dn 



Since u,v are harmonic in 17^ and zero on dU, we see that 1^ + Je = for all £ > 0, so that 
also 

lim Ip + lim X = 

We now compute lim I^: Since v{yi) is smooth (at least C^) near xq, its derivatives are 

£— >0 

continuous, hence bounded near xq. As u(x) := G(x,xo) := *(x — xq) — ^^°(x), and as 
is smooth, hence bounded near xq, it follows that 



u— db 
dAe on 



< 



dAe on Jqa^ dn 



as £ — 



/ 



The other term of le is 

v—- dS — >■ u(xo) as £ 
IdAe on 

Hence lim 1^ = v(xo). In the same way, hm = — u(yo). As lim + lim = we see that 

£— »0 £— >0 £^0 £— »0 

v(xo) = u{yo) and so 

G(xo,yo) = ^^(xo) = u{yo) = G(yo,xo) 
Since xo,yo were arbitrary, the result follows. 

H 

One way to interpret the Green's function is that it is the kernel of a linear operator 
which inverts the PDE. Consider again Theorem 14.3.9, and note that the Green's function 
representation of the solution to the Poisson problem with homogeneous Dirichlet boundary 
conditions 

-Av = f inU 
V = on dU 

is particularly simple: 

v{k)= [ G(x,y)/(y)dy 

Ju 

Define a linear operator (on functions) Lp by 



^p[/]W= / G(x,y)/(y)dy 

Ju 



Lp is said to be an integral operator with kernel G. It is immediately clear that — just like 
integration is an inverse to differentiation — the integral operator Lp is a kind of inverse to 
the differential operator —A: 

-A^; = / ^ v = Lp[f] 



Laplace's Equation 



281 



In the same way the Dirichlet problem with non-homogeneous boundary conditions 

-Aw = inU 
w = g on dU 

has representation 

w{^) = - jf^5(y)|-G(x,y) dS{y) 

The integral operator Ld with kernel — ^G(x, y) inverts the Dirichlet problem. 
Now note that ii v,w are solutions to 

-Av = f - Aw = inU 
V = w = g on dU 

then, by linearity, it follows that u = v+w'iss, solution to a general Dirichlet-Poisson problem 

-Au = f inU 
u = g on dU 

As the solution to such a problem is unique, it follows that u is the solution, i.e. the Dirichlet- 
Poisson problem can be regarded as two separate problems, a Poisson problem and a Dirichlet 
problem. Rewrite the problem suggestively as 

^ u = ("^^ on ^ ^ 

and define L by 



I J \gj \dU 



lQ -Lpifj + Loig] 



u = \ u = L 



and we see that 

i.e. L applied to the data gives the solution of the BVP. 
Another way to see the Green's function is to note that 

/(y)G(x,y)/(y)dy 



u 

is a kind of weighted sum of terms /(y), weighted by G(x, y). Thus the solution v(x.) of 
the Poisson problem may be regarded as a kind of "average" of the data /, weighted by the 
Green's function. The Green's function G(x, y) determines the "strength" of the contribution 
of /(y) to the solution at the point x. The fact that G(x, y) = G(y,x) is like Newton's First 
Law: The strength of the effect of y at x is equal to the strength of the effect of x at y. 
Analogous remarks can be made for the Dirichlet problem: The solution w{x.) is a kind of 
weighted average of the data g, weighted by the inward normal derivative — ^G(x,y) of the 
Green's function on the boundary. 

We will further develop this interpretation when we dicuss Green's functions for the heat 
equation. 



282 



Laplace's Equation 



Exercise 14.3.14 We will solve the one-dimensional Dirichlet-Poisson problem 

-u"{x) = f{x) 0<x<l 
u(0) =u{l) = 

(a) Show that the fundamental solution of the one-dimensional Laplace equation is 

^{x) = ^\x\ 

(b) Hence solve the auxiliary problem for the corrector function ^h^iu) = 0,h^{0) = 
^'(— x), h^{l) = "if {I — x) to show that Green's function G{x, y) for the open set U = (0, 1) 
is given by 

x{y -I) .f ^ 

it y > X 



G{x, y) = < 



I 

y{x - I) 



y < X 



[Hint: Recall that must be smooth in the whole region U to deduce that h^{y) = 
a{x)y + b{x) for some functions a, b.] 

(c) Note that G is symmetric (which is always the case), and that G is not singular (which 
is hardly ever the case). 

(d) Because G is not singular, differentiation under the integral sign is permitted. Verify that 

rl 



;(x)= / G{x,y)f{y)dy 
Jo 



solves the given BVP. 

[Hint: Recall (or if you don't recall, prove) that 



dx 



/ 9{x,y) dy = g{x,b{x)) - g{x,a{x)) + —g{x,y)dy 

Ja(x) Ja(x) 



' a(x) J a{x) 

and split the integral: Jq = Jq + J^.] 

□ 



14.4 Green's Functions: Examples and Exercises 

14.4.1 Green's function for the half— space 

Consider the half-space 

W : = {{xi, . . . ,Xn) e'^'' : Xn > 0} 

OT" = {(xi, ...,Xn) ■.Xn = 0} 

We want to determine the Green's function for this region. Note, however, that the 
half-space is open, but unbounded, so that the results of the previous section may not hold. 
However, if a "boundary condition at +cx)" holds, which asserts that the functions and their 
derivatives tend to as ||x|| — > oo, then the results can be proven valid. In practice, one 



Laplace's Equation 



283 



proceeds as follows: Assume that the results of the previous section do hold, in order to find 
a candidate G. Then show that this G has the required properties. 

Recall that, by Thm. 14.3.10 G is the unique C^(]H['*) -function such that, fixing xq, and 
regarding G^o(x) := G(xo,x) as a function of x: 

(i) G is harmonic in H", except possibly at the point x = xq. 

(ii) The difference G(xo,x) — ^'(x — xq) is finite at xq, and is harmonic. 

(iii) G(xo,x) = for X G dW. 

Note that the function ^'^"(x) := ^'(x — xq) satisfies (i) and (ii), but not the bound- 
ary condition (iii). We now use a trick, called the method of images, to ensure that the 
boundary condition G = is satisfied by exploiting symmetry. (This method will be em- 
ployed again when we solve the Black-Scholes PDE to price barrier options.) Given a point 
x = (xi, . . . , Xn) G M", define its reflection in the plane by 



and note that x = x* if and only if x G ^H". Suppose that u is harmonic in so that 
m(x) = — ^(x*) for all x. By continuity, if x G dW^, we see that also that n(x) = ?i(x*), 
and hence that tt(x) = for all x G dH^. Thus: In order to find a harmonic function 
which satisfies the boundary condition u = on dW^, it suffices to find a harmonic function 
satisfying the symmetry relation 

■u(x) = — «(x*) 

This immediately suggests a candidate for the Green's function, namely 

G(xo, x) := *(x - xo) - *(x - x^) 

i.e. the corrector function reflects the singularity from xq to xq so that the contributions 
add up to zero on the boundary. We now check that this G satisfies (i)-(iii): 

(i) G is the difference of harmonic functions, and therefore harmonic, as the Laplace oper- 
ator is linear. 

(ii) G(xo, x) — *(x — xo) = — *(x — Xq) is finite at xq — the singularity is at Xq ^ H". 

(iii) Recall that the fundamental * is radial, i.e. that *(x) = *(r) depends only on r = ||x||. 



it follows that G(xo,x) = \I'(r) — ^'(r) = 0, where r := ||x — xo|| = ||x — Xq||. 

In order to use the representation theorem 14.3.9, we need to know the normal derivative 
. The unit outward normal to dW^ is clearly a constant vector field n = (0, . . . , 0, — 1), 
so that -§^ = — gf- • It is straightforward to compute that 



x : — (a^i, ■ ■ ■ , Xfi—\, Xfi) 



Since 




when X G dM' 



■n 



dG 





284 



Laplace's Equation 



Exercise 14.4.1 Perform the computations for ^ above, for both the cases n = 2 and 
ra > 3. 



□ 



Thus, from Theorem 14.3.9 we expect that 
Theorem 14.4.2 (Poisson Formula) Define the Poisson kernel for dW^ by 

^'"•^^ ^.(Dll^'-yll- 

Then the Dirichlet-Poisson problem 

Au = f in M" 
u = g on dW 

has solution 



*(x)=/ G(x,y)/(y)cZy+ / i^(x, y^y) dy 



□ 



Exercise 14.4.3 Use the method of images to determine the Green's function for the positive 
quadrant 

U := {{x,y) e : x > 0,y > 0} 

[Hint: Fix a point (^, 77) G R^. We seek a function G{^, rj; x, y) of (x, y) which is harmonic (in 
x,y) and has the properties that G{^,r]; x,y) — ^'(x — £,,y — rj) is harmonic and finite in U, 
and such that 

G{^, r/; 0, x) = = G(^, r); x, 0) for all x > 0, y > 





+++ 
■ (5c.y) 


+++ 


■ (x-y) 



Show that 
G{(,v;x,y) : 



does the trick. 



1 



Air 



In 



- ry) - ^'(x + £.,y - ri) - ^'(x - y + r?) + *(x + ^, y + ??) 



+ x)2 + (?7 - y)2 (^_a;)2 + (ry + y)2 



□ 



Laplace's Equation 



285 



14.4.2 Green's function for the ball 

Do the following exercise: 




Exercise 14.4.4 Let U : -6(0, R) C M" be the open ball of radius R centered at the origin. 
We seek the Green's function G for the bounded region U. 

(a) Again, we use a kind of reflection (an inversion, really) in the boundary. Given a point 
Xo, we want the "reflection" Xq to satisfy two properties: 

(i) Xo, Xq lie on a line through the origin; and, 

(ii) ||xo|| • ||xq|| = R^, which ensures that the "reflection"' of a point on the boundary 
is itself. 

Show that if Xq 7^ 0, the "reflection" of xq in dU is given by 

, _ i^^XQ 

■"■0 Ti iT2 

(b) Now for a little geometry: For an arbitrary x G M", define 

r:=||x — xoll r*:=||x — XqII f:=||x* — XqH 

Show that 

R^ 
l|xo|| ||x|| 

[Hint: f2 = (x* - xS) • (x* - x*) = • • • = ^^p^r^] 

(c) First, consider the case where x 7^ 0. As before the fundamental solution * is a candidate 
for the Green's function G, but it fails to satisfy the boundary condition, namely that 
G(xo,x) = for X G dB{0,R). To ensure that G = is satisfied, we need to invert the 
singularity. The contribution of the point Xq at a point x on the boundary must be just 
enough to cancel the effect of *(x — xq) = *(r). We thus attempt a solution of the form 



G^"(x) := ^{r)-k^{r*) 



286 



Laplace's Equation 



where the constant k (which may depend on xq, but not on x) is chosen so that C'^(x) = 
when X G dB(0, R). Show that we require k = (^)" • By observing that r* = r when 
X G dB{0,R), conclude that 

which is indeed independent of x. 

(d) Show that 

k^{r*) = * (^^) i.e. that A;*(x - x^) = * (x - x^)) 
Conclude that the candidate Green's function is given by 

G(xo, x) := *(x - xo) - * (^(x - x^)) for xq ^ 

(e) Now show that G satisfies the requirements of a Green's function. 

(f) We still need to find the Green's function G(xo,x) in case x = 0. Show that 

G(0,x) = *(x) 

□ 

Exercise 14.4.5 We solve the Dirichlet problem 

An = in 5(0, i?) 
u = g on dB{0, R) 

Assume first that R = 1. By the representation theorem 14.3.9, we need merely find the 
outward normal derivative ^ on dB{0,R). Show that 

^ _ A - V" ^ 



Then show that 



and that 



dn dr ^ dxi 

1=1 



5* 1 Xi- Xoi 

(x-xo) 



dxi An{l) ||x - xoll'* 

^*"|xo||(x-x^)) = - ^ 



Oi 



dXi'^' ^„(l)||xo||"-2 ||X-XS||" 



_ 1 XiWxoW^ - XQi 

An{l) ||x-Xo||" 

(Use the relation r = i?r/||xo|| from the previous problem. Put R = 1 and note that r* = r 
when X G dB(0, 1). Furthermore, by definition r* := ||x — xo||.) Sum to show that 



Laplace's Equation 



287 



Thm. 14.3.9 now implies that 

^ ^ Anil) JoB(0,l) ^ ' 

If i? 7^ 1, use the change of variable {t(x) := u{Rk), ^(x) := g{R'x.) to show that 

^n(l)^ JdB{0,R) \\y-^\\ 

i.e. 

«(x)=/ i^(x,y)5(y)d5(y) 
^9B(0,R) 

where i^(x, y) := "^"(j^j^ ||x-y||" Poisson's kernel for the ball B{0,R). 



□ 



Exercise 14.4.6 Start with the solution of the Dirichlet problem for the ball of radius R, 

given in Exercise 14.4.5, and restrict to the two-dimensional case. Convert to polar coor- 
dinates in to deduce that the solution of the Dirichlet problem with boundary data g 
is ^ 

u{ro,eo)= [ ^ K{r,9;R,(l))g{<P)d<P 
Jo 

where the Poisson kernel for a ball of radius R is given by 

^' ^' '^^ = R^-2Rrco^il-4>) + r^ 
This is Poisson's formula (in M^). 

□ 

Exercise 14.4.7 Use Poisson's formula, obtained in Exercise 14.4.6 to show that the solution 
of 

Au = in B{0, Vg) 

u = y + y'^ on dB(0, Vg) 



is given by 



u{x,y) = 3 + y+ ^{y^ - x^) 



[Hint: Convert to polar coordinates and show first that u{r, 6) = 3 — r'^ cos^ 9 + ^ + r sin^.] 

□ 



288 Laplace's Equation 



Chapter 15 

The Heat Equation 



We study the heat equation 

Ut - lAu = 

and its non-homogeneous counterpart 

Ut - lAu = f 

subject to appropriate initial and boundary conditions. Here u = it(i,x), and where t > is 
time (which plays a special role) and x = {xi , . . . , Xn) is a spatial variable, x G C/ for some 
open set U C M". The Laplacian operator A refers to the spatial part only, i.e. A = Yl]=i 

Exercise 15.0.8 Suppose that u{t, x) is a solution of the heat equation, that xq G and 
that O is an orthogonal n x n-matrix. Show that the functions 

v{t, x) := u{t, X — Xq) w{t, x) := u{t, Ox) 

are also solutions of the heat equation. Conclude that the heat equation is invariant under 
spatial translations and rotations. 

□ 



15.1 Separation of Variables 

Let L[ ] denote a linear partial differential operator, i.e. one so that L[f] = is a linear PDE. 
It is then easy to see that the set of solutions 

{/:^[/] = 0} 

is a linear space. For if /, g are solutions and a, (3 are scalars, then 

L[af + (5g] = aL[f] + f3L[g] = 

and one might hazard that if / = Yl^=i (^nUn, and L[un] = for all n, then also L[f] = 
0. The separation of variables technique works by writing the solution /(x) as a product 

f{x^, . . . , X") = X^{x^) X"(x") of n functions X^,. . . , X" of just one variable each, 

and then adding these up to ensure that the correct initial- and/or boundary conditions are 
satisfied. 



289 



290 



The Heat Equation 



Example 15.1.1 Suppose that we want to find the solution u{t,x) which satisfies the fol- 
lowing conditions: 

PDE ut = a^Uxx <x < L, 0<i<oo 

BC u{t, 0) = = u{t, L) < i < oo 
IC u(0, a;) = (i){x) 

This corresponds to the following physical situation: u{t, x) is the temperature at time t at 
point x, in a laterally insulated rod of length L, both of whose sides are kept at a temperature 
of 0. 

To solve this problem, we attempt to separate the time- and spatial variables, and hy- 
pothesize that the solution has the form 

u{t,x) = T{t)X{x) 

Substituting this into the PDE yields 

T'{t)X{x) = a'^T{t)X"{x) 

which is equivalent to 

T'it) ^ 

a2T(t) X{x) ^ ' 

Now since the lefthand side of (*) is independent of x, and since the righthand side equals the 
lefthand side, the righthand side is independent of x also! Similarly, both sides are independent 
of t (because the righthand side is independent of t). Thus both sides are constant, i.e. there 
is a constant k such that 

T'jt) _,_X"ix) 
a^T{t) X{x) 

The PDE has now been reduced to two ODE's: 

T' - ka^T = 
X" -kX = 

Solving first for T, we see that 

T{t) = Ce*^"'* C constant 

This blows up if > 0, which is "unphysical" , and we therefore assume that k = — < 0. 
Solving the ODE for X, we get 

X{x) = A cos Xx + B sin Xx A, B constant 

so that 

u{t, x) = " * [A cos Xx + B sin Xx] 

This gives us a solution to the problem for every value of A, A, B. 

We now impose the boundary conditions: The requirement that u{t, 0) = for all t is 
easily seen to imply that ^4 = 0. The requirement that u{t, L) = for all t then implies that 



The Heat Equation 



291 



sin AL = 0, and thus it is necessary that that A = ^ for some n eN. We thus see that each 
of the functions 

Un {t,x)= e-^' sin AnX A„ : = ^ , n G N 

is a solution of the PDE and the boundary conditions. 

Thus far, we have not taken into account the initial temperature u{0,x) = (f){x). If it 
should happen that (/)(x) = B sin for some n G N, then the solution has now been found: 
It is u{t,x) = Se^'-^^-'^* sin Of course, it is not likely that the initial condition (f>{x) is 
of sinusoidal form. Nevertheless, we have seen that "reasonable" functions can be expanded 
in terms of Fourier series. Moreover, by the Principle of Superposition, a linear combination 
of solutions to the PDE and BC is again a solution. We thus ask if we can find constants 5„ 
so that 

oo 

mrx 



n=l 



and this is easily done: 



• Extend (j) to [— L,0) by requiring that (/>(— = —(j)(x), and then extend (f) to all of M 
by requiring that it be periodic, with period 2'k/L. 

• Note that the resulting (extended) (j) is an odd function, and thus the Fourier expansion 
of (f) will have only sine terms, and no cosine terms. 

• From Fourier analysis, we know that 

11'^ mix 2 nnx 

Bn = — I (p\x) sm —j— dx = — (p{x) sm — — dx 

So we have now obtained the solution: 

/ N . mrx , „ 2 /"■^ mrx , 

u{t, X) = > Bne ^ L ' '■sm —— where Bn = y (pyx) sm —— dx 

n=l ■'^ 



Further Discussion: 



□ 



The separation of variables technique attempts to reduce the PDE to two ODE's. In 
the above example, we reduced the PDE 



Ut = a^Uxx 



to the two ODE's 

dt 



dx^ 

Hence T is an eigenfunction (i.e. an eigenvector in a function space) of the linear 
operator ^, with eigenvalue ka^, and X is an eigenfunction of the linear operator , 
with eigenvalue k. 



292 



The Heat Equation 



• Above, we glibly asserted that A; < 0. Let us verify what happens if A; = or A; > 0. 

- If A; = 0, then X{x) = A + Bx, and T{t) is constant, so we can write u{t, x) = 
A + Bx. The boundary conditions are now easily seen to imply that A = = B. 

- If A; > 0, we can write k = A^. Then X{x) = Ae^^ + 56"^^ and T{t) = Ce^^K The 
solution for X may also be written in terms of hyperbolic functions: Recall that 
coshx := ^""^^ ,smhx := 



5 - 
4 - 

\^ 3 - 
2 




— cosh(x} 
— sinh(x) 




~ 4— 




-2.5 -2 -1.5 -1 ^ftS-"^ ( 

-1 

-2 

/ -3 

-4 - 


1 0.5 1 1.5 2 2.5 



SO that we may write X{x) = A cosh Ax + 5sinh Ax for some (different) constants 
A,B. We thus obtain 

u{t, x) = e^^* [A cosh Xx + B sinh Ax] 

With the help of the graphs of cosh and sinh it is easily verified that the boundary 
conditions imply that A = = B in this case also. 

— The Separation of Variables technique relies on the Principle of Superposition, 
and thus only applies to linear PDE's. Moreover, at least one of the independent 
variables must be restricted to a finite interval. The domain of the problem must 
be consistent with the coordinate system, e.g. it must be a rectangle in cartesian 
coordinates, or a sector in polar coordinates, etc. The boundary conditions must 
be linear and homogeneous also, e.g. of the form 

aux{t,0) + f3u{t,0) = 
jUx{t,L) + Su{t,L) = 

Note, however, that there are various tricks that may be used to turn a prob- 
lem with non-homogeneous boun dary conditions into one with homoh=geneous 
boundary conditions. 



□ 



The Heat Equation 



293 



Example 15.1.2 Consider a laterally insulated rod of length L = 1. The initial temperature 
is throughout the rod. The lefthand side of the rod is kept at °C, whereas the righthand 
side is maintained at 100 °C. The ICBVP for this is 

PDE ut = o?Uxx 0<x<l, 0<t<oo 

BC it(t,0) = tt(i, 1) = 100 <i<oo 
IC u(O,a;) = 0(x) 

If we attempt to use separation of variables, we run into trouble: Because the boundary 
conditions are not homogeneous, the sum of solutions will not be a solution (e.g. if ui,U2 are 
solutions, then {m + 1x2) (t, 1) = 200 ^ 100). 

Nevertheless, we can employ a trick: We break up the solution u into a steady state 
solution (obtained in the limit as i — >■ 00) and a transient solution uF (which will go to 
zero as t ^ 00, i.e. we write 

u{t, x) = u^{t, x) + 'uF{t, x) 

The steady state solution is, by definition, time independent, and will therefore satisfy uf = 0. 
Then u^^ = also, so u^{t^ x) = A + Bx. The boundary conditions are easily seen to imply 
that A = Q,B = 100, and thus the steady-state solution is u^{t, x) = lOOx, just as one would 
expect. Thus we have 

u(t,x) = lQ{)x + 11^(1^, x) i.e. vF {t,x) = u{t,x) — IQQx 

It is now easy to find an ICBVP for vF: 

PDE uj = a'^u^x 0<x<l, 0<t<oo 

BC «^(t,0) = u^(t, 1)=0 0<t<oo 
IC u^{0, x) = <tF{x) := (pix) - lOOx 

These boundary conditions are homogeneous. Prom the previous example, we now know that 

°° 2 2 

uF{t,x) = Bne~^'^^°^ *sinn7rx where Bn = cfF (x) smnirx dx 

n=l ^ 

and thus u{t, x) = lOOx + vF{t, x). 

□ 

The best way to familiarize yourself with this technique is to do the following exercises: 

Exercise 15.1.3 Consider an insulated rod (laterally as well on the sides) of length L with 
initial temperature (t>{x). In this case, the ICBVP is 

PDE ut = Uxx < x < 1, < t < 00 

BC Ux{t,0) = Ux{t,l) = 0<t<oo 
IC u(0, x) = (l){x) 

(a) Write u{t^x) = T{t)X{x) and reduce the problem to two ODE's. 



294 



The Heat Equation 



(b) Deduce that T{t) = Ce'^ * and that X{x) = A cos Xx + B sin Ax. 



(c) Why is B = 0? 



(d) Why is A = nTT for some n G N? 



(e) We now have a solution Un{t, x) = A„e " * cos mrx for each n = 0, 1, 2, Why can 

we say that 




n=l 



is also a solution? (We have renamed to become here, because this is more 
convenient for Fourier series.) 



(f) We now deal with the initial condition. Explain why we require that 



cos mrx 



for < X < 1 



n=l 



(g) We thus need a Fourier expansion of purely in cosines. Explain how to do this. [Hint: 
Make cf) an even function , and then periodic with period 2.] 

(h) Conclude that 



(i) What is the steady-state solution to this problem? How do you interpret this solution? 

15.2 The Fundamental Solution 

When we solved Laplace's equation Am = 0, we obtained our first solution — the fundamental 
solution — by exploiting the an invariance property of the Laplacian operator, namely the 
fact that the Laplace's equation is invariant under rotations. This means that if u{y:) is 
a solution of Laplace's equation, and O is an orthogonal matrix, then u(Ox) is a solution 
also. Exercise 15.0.8 shows a similar result for the heat equation, but holds only for spatial 
rotations, and does not involve time. The structure of the heat equation suggests another 
kind of invariance. The expression ut — ^Au = contains one derivative with respect to the 
time variable, and two with respect to each spatial variable. Thus if u{t, x) is a solution, so 
is u{\'^t, Ax) (for A > 0), i.e. if we scale space by a factor of A and time by a factor of A^, the 
resulting function remains a solution of the heat equation. Now if we define r := ||x||, then 




An = 2 (j){x) cosnTTX dx 
Jo 




n = 0,1,2,... 




(Ar)2 



r 



,2 



t 



and this suggests that we seek a solution v which is a function of ^. As in the case of 
Laplace's equation (where we chose a solution which is a function of just r), exploiting the 
symmetry of the PDE will turn it into an ODE, which we can solve. 



The Heat Equation 



295 



Exercise 15.2.1 Wc do this for the case n = 1. Put ^ ■= ^ and suppose that v{t, x) = v{^) 
is a solution of the heat equation vt — \vxx = 0. 

where v'{^) = and obtain 

Define w := v' and integrate to obtain 

Integrate again to obtain 

v{^) = C [ e-'^'/^ 

J —oo 

i.e. 

/ X 
-oo 

□ 

Now that we have obtained a solution 

X 

uit,x) = C f'^\-y'/''dy 
Jo 

for the one-dimensional heat equation, we may ask what initial conditions this solution sat- 
isfies. The solution is undefined at t = 0, so we examine the behaviour of u{t, x) as t ^ 0. 
Fortunately, we recognize in a function which is closely related to the density of a 

standard normal random variable Z. This suggests that we choose C = so that 

u{t,x) = F{Z<f^) 

if X > 
if X = 
if X < 

if X > 
if x = 
if x<0 

does Ux- 

{'^x)t ~ {"^tjx ~ {'^xx)x ~ {'^x)xx 



Now 



lim 



oo 


-oo 



and hence 



r 1 



u(0+,x) = < 



Now note that if u solves the heat equation, so 



i.e. 

= '^xx where v := Ux 



296 



The Heat Equation 



For the function u above, we have, by the Fundamental Theorem of Calculus, that Ux{t,x) = 
x) where 



^t,x) := 




x) is called the fundamental solution of the heat equation, and also the heat kernel. 
The initial conditions corresponding to the solution $ are quite peculiar: We see that 

lim x) = for all x 7^ 

However, the limiting behaviour at x = is not immediately obvious. When we remember 
that x) is the density of an N(0, t)-random variable, however, we see that the graphs of 
the x) are bell curves centered at .t = that are becoming ever more peaked, in order to 
ensure that the total area under each curve ^{t,x) (for fixed t) equals unity. Thus $(0,x) is 
a very peculiar function: 

/oo 
^{0,x) dx = 1 
-00 

Of course, if $(0, x) = for all x 7^ 0, then $(0,x) = A-a.e., where A denotes Lebesgue 
measure, and thus (by the properties of the Lebesgue integral) $(0, x) dx = dx = 
0. In other words, there cannot be a function x 1— $(0, x) with the required properties. 




The object $(0, x) is therefore not a function. In fact, if we keep thinking probabilistically, 
we see that it corresponds to the distribution of a random variable with mean and variance 
0, i.e. the $(0,x) is the "density" of a random variable X which has X = a.s., i.e. a point 
mass at 0. It therefore has the following important property: If / : M — M is a (sufficiently 
regular) function, then 

/oo 
$(0,x)/(x)dx = E[/(X)] = /(0) 
-00 



The Heat Equation 



297 



i.e. though we cannot think of ^>(0, x) as a function R ^ R, we can think of it as a, functional, 
i.e. a function from a set of (sufficiently regular^) functions to M: 

^{0,x) is a rule which assigns the real number /(O) to the function / 

This "rule" is called the Dime delta function, denoted Sq, and was first introduced by the 
physicist Paul Dirac in the 1920's in connection with quantum mechanics. It was only in the 
late 1940's that Laurent Schwartz developed a rigorous mathematical theory of such objects, 
called distributions or generalized functions. Though a rigorous development of generalized 
functions requires deep results from functional analysis, an intuitive development will provide 
insight into the solution of Cauchy problems, and into the nature of Green's functions. We 
will tackle this soon, in the Section 15.5. 

For the multidimensional case we define the fundamental solution as follows: 

Definition 15.2.2 (Fundamental Solution of the Heat Equation) 
The function 



$(t,x) : = 



(27rt)"/2 

X G M", i > 



□ 



Remarks 15.2.3 Note that $(t,x) is the joint density function of a random vector (Zi, . . . ,Zn) 
of independent iV(0, -variables, i.e normal random variables with mean and variance t. 
We therefore know that 

/ $(t, x) dx = 1 for each t>0 

Just as we did for the fundamental solution of Laplace's equation, we will occasionally write 
$(t,x) = $(t, ||x||) to highlight the fact that it is radially symmetric in the spatial variable 

X. 



Exercise 15.2.4 (a) Verify that $ solves the heat equation ut — \u = 0. 
(b) Show that $ explodes at (0,0). in such a way that 



lini/ $(t,x)/(x)dx = /(0) 



for sufficiently regular / : R" — > M. 

[Hint: Adapt the argument for the one-dimensional $(t, x) given earlier.] 

(c) Show that $(t,x) G C°^((0,oo) x R"). 



□ 



□ 



^ "Sufficiently regular" is code for "the function is nice enough to allow me to do what I'm about to do to 

it". 



298 



The Heat Equation 



15.3 Solving the Heat Equation 
15.3.1 The Cauchy Problem 

We will first concern ourselves with the Cauchy problem on the unbounded domain U := 
M+ X M". Note that dU = {0} x M". We seek a function u : M+ x ^ M which solves 

Ut - lAu = in (0, oo) X 
n = 5 on {0} X 

The "boundary condition" u = g is an initial condition: u(0,x) = g'(x) for all x G M". 

We first proceed intuitively, in the one-dimensional case: Fix a y G M, and regard x 
as a variable. Then ^{t,x — y) is just a translation of and thus a solution of the 

one-dimensional heat equation. In this way, we can get, for different values G R, many 
solutions X — yk) of the heat equation. The heat equation is linear^ however, and thus a 
linear combination of solutions is again a solution. In particular (regarding each y^ as fixed) 
the sum 

rn 

^ <I>(t, X - yk)g{yk)Ayk 

k=l 

is a solution, where Ay^ := yk—yk-i, and the "coefficients" g(yk)Ayk of the linear combination 
are "constant", because each yk is regarded as fixed. In the limit, the linear combination can 
be made an integral: 

/oo 
'^{t,x- y)g{y) dy 
-oo 

and assuming that all functions involved are sufficiently regular to allow interchange of limit, 
derivative and integral (this needs proof), we guess that u{t,x), as a limit of solutions to the 
heat equation, should itself be a solution. 

To obtain the initial conditions satisfied by this solution, we adopt the probabilistic ap- 
proach: For fixed t,x, the function $(t,a; — y) is the density of a A'^(a;, t)-random variable, 
so 

u{t, x) = E[g{X)] where X ~ N{x, t) 
In the limit t ^ 0, the distribution N{x, t) tends to a point mass at x, and so 

u{0, x) = E[g{X)] where X = x a.s. 

= 9{x) 

We have thus shown intuitively that the function 

/oo 
^t,y-x)g{y) dy 
-oo 

is a solution to the Cauchy problem (*), given above. Let's now show it formally, in n- 
dimensions: 

Theorem 15.3.1 (Solution of Cauchy Problem for Heat Equation in M") 
Assume that g : M" R is a bounded continuous function, and define 

u(i,x)= / #(f,x-y)5(y) dy 



The Heat Equation 



299 



Then u is a solution of the heat equation ut — |Au = 0, with 

lim u{t, x) = 5(xo) for all xq G M" {where t > 0, x e M") 

(t,x)^(0,xo) 

Proof: (I): Wc first sliow that u satisfies the heat equation. Since $ blows up at t = 0, wc fix 
(5 > and consider only times t > 5. It is straightforward to verify that tlic partial derivatives 
^xiXi^^t of the fundamental solution exist and are uniformly bounded on (5, oo) x M."", i.e. 
there is a constant Ms so that ^^iXi^^t < for all {t, x) G {6, oo) x M". It follows that for 
such {t, x) we have 

utit, x) - lAu{t, x) = / X - y) - lA^{t, x - y)]5(y) dy 

as interchange of integral and derivative is allowed, by the dominated convergence theorem. 
Since $ is a solution to the heat equation, we see, by allowing 5 — 0, that 

ut{t, x) - |A(i, x) = for all (t, x) G (0, oo) x M" 



(II): We now tackle the initial condition. Fix xq G M" and e > 0. We want to show that 
\u{t, x) — (7(xo) I < e when (t, x) lies sufficiently close to (0, xq). First, by continuity of g, there 
is 5 > (not the same as the 6 in I.) so that 



\g{y) - 5(xo)| < e/2 whenever 



|y-xo|| < S 



Suppose that ||x — xo|| < 5/2. Since ^(i,x — y) dy = 1, we have ^(xo) = J^n ^{t,^ — 
y)c/(xo) dy and so 



k(*,x) -5(xo)| = 



< 



/ $(i,x-y) (5(y)-5(xo)) dy 
/ ^(i,x-y) |5(y) - c^(xo)| dy 



= / $(t,x-y) |5(y)-5(xo)|dy+ / $(t, x - y) [^(y) - 5(xo)| dy 

^S(xo,<5) ^R"-S(xo,<5) 

We consider each of the integrals /g^xo 5)' /r"-s(xo S) turn: 
(Ila): For y G 5(xo,(5) we have \g{y) — 5(xo)| < e/2, and hence 

/ $(t,x-y) |5(y)-5(xo)|dy < ^ / $(t, x - y) dy < ^ / $(i, x - y) dy = e/2 

Jb{xo,S) ^ Jb(xo,S) ^ JR" 

(lib): For the remaining integral, recall that g is assumed bounded, and let K be a bound, 
i.e. suppose that \g{'x.)\ < K for all x G M". We are also still assuming that ||x — xo|| < 5/2. 
For y G M'* — -B(xo, 5), we have 

||y - xoll < ||y - x|| + llx - xoll < ||y - x|| + 5/2 < ||y - x|| + Illy - xoll 



so that 



|y-x||>|| 



|y -xoi 



300 



The Heat Equation 



Hence 



/ $(t,x-y) |5(y)-5(xo)|dy<2if / $(i,x-y)dy 

7r"-B(xo,<5) 7R"-B(xo,(5) 



2K r 



g-||x-y||V2t 



(27rt)«/2 7k"-B(xo,5) 

- (27rt)"/2 7k«-b(xo,5) 
= 2i^P(||Z|| > (5) 

where Z = (Zi, . . . , Z„) and the Zi are independent Ar(xoi, t)-variables 

^0 as t ^ 0+ 

Thus for t > sufficiently small, we can ensure that /jgn.g^xo <5) ^(*' ^~y) l5(y) "^(''^o)! f^y < 

6/2. 

(He): Putting (Ha) and (Eb) together, we see that |n(t,x) — 5(xo)| < e/2 + e/2 whenever 
||x — xqII < 5/2 and t > is sufficiently small. 

H 

Remarks 15.3.2 We have now shown that the function u{t,-x.) = J^„ $(t,x — y)g{y) dy is 

a solution to the Cauchy problem for the heat equation with initial data g on the unbounded 
domain M"*" x M". Can there be other solutions? The answer is: yes and no. Yes, there may be 
other solutions to this problem. These others will all blow up, however. Using an analogue of 
the maximum principle (which we used to prove uniqueness of solutions to Laplace's equation) 
it can be shown that there is at most one solution u satisfying a growth bound 

ti(i,x) < yle-^ll^ll" forallxeM" 

for constants A,B>0 i.e. all the other solutions grow extremely fast. This is unphysical, 
when dealing with actual heat problems. 

It is also unfinancial: If C{t,S) is the price of a call at time t before maturity, when the 
underlying price is S, we expect by arbitrage considerations that C{t, S) < S — explain why! 
— and that 1 as >S oo. Assuming that C is the solution to some kind of heat 

equation — the topic of the next section — such rapid growth is not permitted, as it will 
imply that ^^^^ — oo as 5 — oo, so that C{t, S) > S for big enough S. 

□ 

15.3.2 Diffusion on the Half— Line: The Method of Images 

We consider the following initial-boundary value problem: 

vt - \vxx = (t, x) e (0, oo) X (0, oo) 

f (0, x) = ^(x) for X > (*) 
v{t, 0) = for t > 

For intuition: This IBVP governs the evolution of the temperature of a semi-infinite bar 
(corresponding to the half-line < x < oo) whose initial temperature is given by the function 
g{x), and whose x = 0-end is kept at a constant temperature of 0°. 



The Heat Equation 



301 



The trick in solving this problem is to set up a Cauchy problem on the whole line — oo < 
x < oo, because we already know how to solve Cauchy problems on the whole line, by 
Theorem 15.3.1. We have to enforce initial conditions g{x) = for x > 0, but we have 
some freedom in choosing initial conditions g{x) for x < 0. All we need to do is to choose 
these initial conditions to ensure that the boundary condition v{t,0) = are forced to hold, 
by employing symmetry. 

In this (very simple) case, it is easy to figure out how to do it: If we ensure that the 
temperature at the point —x is minus the temperature at x, then the temperature at x = 
must be zero: 

u{t, -x) = -u{t, x) =^ u{t, 0) = u{t, -0) = -u{t, 0) =^ u{t, 0) = 

This suggests the following Cauchy problem: 

ut - ^u^x = {t, x) G (0, oo) X (0, oo) 
u{0,x) = g{x) 

J 0(x) for X > 
[ —(f){—x) for x < 



where g{x) 



Theorem 15.3.1 requires that g be continuous, but g may have a discontinuty at x = 0. We 
treta this technical point cavalierly, and write down the solution suggested by Theorem 15.3.1: 



/oo 
- y)g{y) dy 
-oo 

foo -I /<0 



poo -I py) 

/ e-(--2')V2t<^(y) dy+—= e-^--y)'l^\-^{-y)) dy 
Jo \/ zirt J-oo 

4>{y) dy 



^-{x-yf/2t _ ^-{x+yf/2t 



V27rt 7o 

poo 

/ [^t,x-y)-^t,x + y)]4>{y) 
Jo 



dy 



Thusthe solution to the IB VP (*) is given by: 

POO 

v{t,x)= / [^{t,x - y) — ^{t,x + y)](l){y) dy 0<x<oo,0<t<oo 
Jo 

where $ is the fundamental solution of the one-dimensional heat equation. 

Example 15.3.3 If we solve (*) with (f){x) = 1, we obtain 

vit, x) = P(Z > -^) - ¥{Z > +^) = iV(^) - iV(-^) 

where Z is a standard normal random variable, and N{-) the standard normal distribution 
function. 

□ 



302 



The Heat Equation 



15.4 Applications to Finance 
15.4.1 The Black-Scholes Option Formula 

The following exercise derives the Black-Scholes option price formula by transforming the 
Black-Scholes PDE to a heat equation, and using the fundamental solution. 

Recall the Black-Scholes PDE and boundary condition for the price of a call option on a 
share S with strike K and expiry T: 

Ct + la^S^Css + rSCs - rC = 

C{T, S) = {S- K)+ 

wheveC = C{t,S). 

Exercise 15.4.1 (a) First we reduce this PDE to the one-dimensional heat equation: 
(a.l) Define x = ln^, t = a'^{T — t), and show that we obtain 



Vr = \vxx + IVx - (7 + 2)^^ 

i;(0,x) = (e^-l)+ 



where v{t,x) := jyC(t,S) and 7 = ^ ~ 5- 
(a.2) Now define u{t,x) by v{t,x) = e'^^~^^'^u{T,x). Show that 

Ur = \uxx + (a + ^)ux + {j^a^ + 7a - (7 + 5) - /J^-u = 

(a.3) Finally, show that when a := —7 and /? := —5(7+ 1)^) the boundary value problem 
(*) reduces to 

Ut — 2^xx , ^ 

n(0,x) =e^^(e^-l)+ 
(b) The solution $(r, x) to the Cauchy problem 

Ur — 2'^^^ t ^ 

u{x, 0) = 5o 

(where Sq is the Dirac delta function) is called the fundamental solution of the heat 
equation, and is given by 

1 _^ 
v27rT 

Explain why the solution of a boundary value problem 

Ur — 2^xx 

u{0,x) = f{x) 

is given by 

/oo 
<^{T,x-y)f{y) dy 
-00 



The Heat Equation 



303 



(c) We now solve for the option price: Let N{x) := ^{l,y) dy and note that N(x) 
F(Z < x), where Z is a standard normal random variable. 

c.l Use (b) to show that the solution to the EVP (**) in (a.3) is 
C.2 Finally, note that 

C(0, S) = Kv{T,x) = ii:e-^^-^(7+i)'ry(T,x) 

and deduce that 

C(0, S) = SN{d+) - Ke-'^Nid-) d± = ^^l+J^^^')^ 



□ 



15.4.2 Barrier Options 

A barrier option is a type of path-dependent option, i.e. the payoff of the option depednds 
on the price-path of the underlying security. We will consider a down-and-out call option: 
This is like a standard (vanilla) call option C on a stock S with strike K and expiry T, except 
for one feature: If the underlying asset S ever reaches a barrier level L < K during the life of 
the option, then it expires worthless. It can only have positive payoff when the minimu asset 
price during the life of the option is > L. The payoff is therefore 

V{S,T) := {St - K)^ St>L} 

To solve this using stochastic analysis is not a trivial task. We therefore attempt to solve the 
appropriate boundary value problem. The relevant PDE is just the Black-Scholes PDE. But 
in addition to the standard boundary condition for a call option 

V{S,T) = {S-K)+ 

we must also take cognisance of the fact that V = when S = L, i.e. that 

V{L,t) = foralH<T 



The appropriate EVP is therefore 

Vt + la^S^Vss + rSVs - rV 
V{T,S) 
V{L,t) 



= 

= {8- K)+ ioT S> L (*) 
= 



where V = V{t,S). 

We first reduce the PDE to the heat equation, as in section 15.4.1: Put x = In^, t = 
a'^{T-t), to obtain 

Vr = \vxx + JVx - (7 + l)v 

v{0, x) = {e" - 1)+ for X > In ^ 
^;(i,ln^)=0 



304 



The Heat Equation 



where v{t,x) := ^V{t,S) and 7 = — 5- Now define u{t,x) by v{t,x) = e°'^~^^'^u{T,x), so 
that 

Ur = luxx + (a + ^)ux + (^la^ +7a - (7 + |) - = 
Finally, when a := —7 and /3 := — ^(7 + 1)^, the boundary value problem (*) reduces to 

U^ 

u{0, x) = eJ-^{e' - 1)+ for x > xo 

(**) 

u{t, Xq) = 
where a;o : = In ^ 

When we solved the Black-Scholes PDE for a call option in section 15.4.1, we reduced it to 
a Cauchy problem, i.e. an initial value problem , without additional boundary conditions. 
Here, we have the boundary condition u{t, xq) = 0, however. But we can use the method of 
images to turn this problem into an equivalent Cauchy problem, which would allow us to use 
the fundamental solution of the heat equation. 

For intuition, consider u to be the temperature of an infinite bar. The initial condition only 
imposes constraints on the region of the bar where x > xo- To ensure that the temperature 
is always zero at xq, we attempt to impose initial conditions for the region x < xq which 
guarantee this, i.e. which "cancel out" the temperature at xq- In effect, we reflect the initial 
data in the point a;o. Now the reflection of the point a; about xq is the point a; := xo — {x—xo) = 
2xo — X. If we ensure that the temperature at x is precisely the negative of the temperature 
at x, then, by symmetry, the temperature at xq must be zero. We therefore want 

n(0, x) = — "^(0, x) = — u(0, 2xq — x) for x < xq 

Therefore, let u be solution to the following Cauchy problem: 

tt(0, x) = e^^(e'' - 1)+ for x > xq (* * *) 

«(0, X) = _eT(2^0-x)(e2xo-a; _ ^ ^ 

We can write down the solution directly in terms of an integral involving the fundamental 
solution of the heat equation, but it turns out that a clever use of symmetry and superposition 
will allows us to write the solution in terms of solution obtained in section 15.4.1. 

The linearity of the heat equation suggests that we break up the Cauchy problem for u 
into two separate problems: The first part is: 

""r ~ 2^xx /IN 

ni(0,a;) = eT^(e^-l)+ ^ 
The other part, i.e. the Cauchy problem with antisymmetric data, is 



2 _ 1„.2 

it) 



u'^{0,x) = _eT(2^o-^)(e2^o-=^ - 1)+ 

By linearity, the sum + is again a solution of the heat equation. The initial conditions 
for the solution + are 



u 



\0, x) + u\0, x) = u\0, x) - u\0, x) = e^"(e" - 1)+ - e7(2xo-x)(g2x„-x- _ -^^+ 



The Heat Equation 



305 



We examine the initial condition more closely: Note that xq = In. ^ < 0, as L < K . Thus 

e^^(e^-l)+ ifx>0 

e^=^(e" - 1)+ - eT(2^o-=^)(e2^o-^ - 1)^ = < -e^(2^o-^)(e2^o-^ - 1)+ if x < 2xo 

if 2x0 < X < 

r eT^(e^ - 1)+ if X > xo 

~ \ _eT(2^o-x)(g2xo-x _ ;^)+ if a; < xo 

because 2xq < xq < 0. Thus + u'^ satisfies the same initial conditions as (* * *) and thus, 
by uniqueness of the solution, we sec that 

u = + is the solution of (* * *) 

We already know from section 15.4.1 how to solve the first part,namely 

u\r,x) = e-''''-'^^^C{t,S) 

where C{t,S) is the formula for the time-t price of a call option on S with strike K and 
expiry T. 

Because the heat equation is invariant under spatial translations and rotations, we see 
that tt^(r, x) := — u^(r, x) is a solution of the heat equation, because u^{t,x) is. Moreover, 
the initial conditions are ^^(Ojx) = — ^^(Ojx), which means that solves ($). Thus 

Here, the expression ^ is a result of the fact that replacing x by 2xo — x is equivalent to 
replacing 5 by We have S = Ke^, and so itfe^^o-^ = ife^i" = = ^. 
It follows that 



5 



Now 

g-2a(xo-x) ^ g27(ln^-lnf ) ^ ^ 

Since the solution V{t,S) of the original problem (*) is just i^e"^"'"^'^u(T, x), we have 

V{t,S) = C{t,S)-{^y-^C{t,^) 

where C{t,S) is the formula for the time-i price of a call option on S with strike K and 
expiry T. 

We have now priced a down-and-out option with strike K and barrier level L < K. Up- 
and-out options can be valued similarly. For knock-in options, in-out parity can be used, 
e.g. the equation 

Down-and-Out Call + Down-and-In Call = Vanilla Call 



must clearly hold by no-arbitrage. 



306 



The Heat Equation 



15.5 Distributions 
15.5.1 Bcisic Definitions 

Here are some important definitions: Let C M" be an open set. Recall that the support of 
a function ^ :[/—>■ M is defined by 

supp(^) = cl{a; G M" : / 0} 

i.e. the support of is the closure of the set where ^ does not vanish. 

Definition 15.5.1 (Test Functions) 

A test function is a C°° function cj) : U ^'R with compact support, i.e. a function that has 
partial derivatives of all orders, and which vanishes outside some compact set K CU. 

□ 



An example of a test function is 
<^xo,£(x) := 



e s^-ll=^-xolP if ||x - xoll < £ 
else 



The function ^xo,£ vanishes outside a ball of radius e centered at xq, and has partial derivatives 
of all orders. 

The set of all test functions on U is denoted by ViU). It is easy to verify (do so!) 
that T){U) forms a linear space, with the usual (pointwise) operations of addition and scalar 
multiplication. We can also equip ^{U) with a topology: 

Definition 15.5.2 (Convergence of Test Functions) 
If 0„, (p G 'D{U), we say that 

4>n^(t> 'mV{U) 

if and only if 

(i) There is a a compact set K C U such that supp(^„) C K for all n (i.e. all the vanish 
outside K). 

(ii) (j)n and the partial derivatives of ^„ of arbitrary order converge to uniformly to those of 

□ 



Definition 15.5.3 (Distributions) 

A distribution (or generalized function) is a continuous linear map f : ^{U) — > M. This means 
that 

Linearity: f{ai(j)i + a2(f)2) = "1/(^1) + oi2f{(t>2) <l>i,<f>2 € 1^{U), ai, 0:2 e K 
Continuity: If ^ in V{U), then /(^„) f{(j)) (in M) 

The set of distributions on U is denoted by 'D'{U). 



The Heat Equation 



307 



□ 

The notation 

(/,</-) ■■= m 

is often used to denote the action of a distribution on a test function. 

The notion of distribution generahzes that of function (hence the name generalized func- 
tion): If / : M" ^ M is an ordinary Lebesgue measurable function (subject to somke integra- 
bihty conditions), we can consider it has a distribution by defining 

/(</>) := / /(x)<^(x) dx 

It is easy to verify that the map /(^) is a distribution: Firstly, 

/(ai^i + a2(p2) = / /(x) (ai0i(x) + a2^2(x)) dx 

= ai / /(x)0i(x) dx + a2 / /(x)^2(x) dx 

JK" ^R" 

= aif{(pi) + a2f{(t)2) 

so /(0) is linear. Secondly, li (pn ^ (p in T>(U), then the dominated convergence theorem will 
usually ensure that /(x)^„(x) dx /(x)0(x) dx, i.e. that /{(pn) f{^)- 

Not every distribution is of the form /(x)(/)(x) dx, however, as we shall see shortly. 
Nevertheless it is customary to write 



/ 



/(x)^(x) dx instead of f{(p) 



even when / is not an ordinary function, i.e. the following expressions all mean the same 
thing: 

/(0) = (/,0)= / /(x)<^(x)dx 
7r" 

The simplest example of a distribution which is not obtainable from an ordinary fuction 
in the above manner is: 

Definition 15.5.4 (Delta Function) 
The delta function is the distribution 

So : V{W) ^R:(f)^ 0(0) 

i.e. it is the generalized function having 

(5o(x)0(x) dx = {do,(t))(f>{0) for all test functions (p 



I 



Similarly, if xq G M", we define the delta function (^xo by 

(<5xo,'?!') := (Pi^o) 

i.e. 



/ (5xo(x)0(x) dx = 0(xo) 



308 The Heat Equation 



□ 

Another example of a one-dimensional distribution is the map (p ^ <P"{^), for example. 

We recognize the delta function as the "density function" of a particular measure, the 
point mass at 0. This is the probability distribution whose distribution function is given by 
the Heaviside function: 



H{x) := 




if X > 
if X < 



If n is an arbitrary finite measure on (R", B(W^)) we can regard /i as generating a distribution 
(i.e. a generalized function) as follows: For a test function (p, we define 

If /X is a probability distribution with a density / (or more generally, if /x << A with 3^ = /), 
then 

/ <A(x)Mrfx)=/ /(x)0(x)dx=(/,0) 

, i.e. = fn and so the distribution (generalized function) induced by /x corresponds to an 
ordinary function,namely its density. 

For novices, distributions are best regarded as a hybrid of familiar objects: They are 
partly function, partly measure. Some of the operations one can perform on functions can be 
performed on distributions, but others cannot. For example, it is always possible to multiply 
real-valued functions, but this is not true for distributions, even in the very simplest case: 

Example 15.5.5 If f,g are ordinary functions of one variable, and (f) a test function, then 

{fg)<Pdx= / /(#)dx = (/,#) 

-00 J —00 

where the latter expression makes sense if gcf) is a test function, which is the case if is a 
C°^-function. This suggests how to define multiplication of distribution / with an ordinary 
C°°-function g: 

{fg,<t>) ■■= {f,g4>) 

i.e. the product of a distribution / and a smooth function ^ is a distribution which assigns 
the number (/, gcj)) to each test function 0. If / is an ordinary function, the the distributional 
product of /, g coincides with the ordinary product. 

Suppose we try to make sense of the product of the delta function with itself: 5q = 5qX 5q. 
Then 

(,5o^0) = (5o,M = '5o(O)</.(O) 
and this makes no sense, because (5o(0) makes no sense. 

□ 

What makes distributions important for PDF theory is that we can differentiate them. If 
an ordinary function is diffcrcntiable, then its derivative as a distribution will coincide wityh 
its derivative as a function. However, even non-differentiable functions become diffcrcntiable, 
provided we consider them as distributions instead of ordinary functions. In particular, if 
distributions have partial derivatives, then they may be solutions of PDEs. We will define 
the derivatives of a distribution shortly. First, we need to introduce some topology on the set 
V'iU) of distributions, and for that, a notion of convergence will suffice. 



The Heat Equation 



309 



15.5.2 Convergence of Distributions 
Definition 15.5.6 (Convergence of distributions) 

A sequence /„ in V'iU) converges to / € V'iU) if and only if /„((/>) — > /(^) for all (f> G T){U), 

1. e. 

fn^fmV'iU) ^ [ /„(x)(^(x) dx^ / /(x)0(x)dx 

{fn,<P) {f:'P) for all test functions (p 

□ 

Example 15.5.7 Let^(t,x) := ^=e~^^/^* be the fundamental solution of the one-dimensional 
heat equation. In the previous section, we saw that 

/ $(i,a;)0(x) ^ ^(0) as t^O 

because J^^{t,x)(l){x) dx = E[(f){X)], where X ~ N{0,t). Since ^{0) = J^6o{x)(f){x) dx, it 
follows that 

^{t,x)-^5o mV'{U) 

We already know that limit $(0, x) = limt_^.o ^{t, x) cannot be an ordinary function. We now 
see that it is a distribution. 

□ 

Remsirks 15.5.8 1. Convergence of distributions is inherited from convergence of measures: 

It resembles one found in probability and statistics, namely convergence in distribution of 
random variables: If Xn, X are random variables, then we say that Xn ^ X in distribution 
if and only if the distribution functions Fx^ (x) of the the Xn converge to the distribution 
function Fx{x) of X at every point x where Fx is continuous. It can be shown, however, 
that we have the following equivalence: 

Xn X in distribution E[0(X„)] — E[^(X)] for every bounded continuous 

If Xn,X have density functions /n, /, then we have 

Xn X in distribution / fn{x)(t){x) dx ^ f{x)(f){x) dx 

Jr Jr 

for every bounded continuous 4> 

and this looks very similar to the definition of convergence of distributions (i.e. of gener- 
alized fuctions). 

2. If fn, f are ordinary functions — > M, then to say that /„ — > / in distribution neither 
implies nor is implied by — > / pointwise (or A-a.e.). For example, the functions 



fn{x) := n/(o 1] 
have 

1 /"n 1 

fn{x)ip{x) dx = J ^{x) dx = average value of ip on (0, - 



310 



The Heat Equation 



Hence 

/•oo 

fn{x)(f{x) dx — ^ (f{0) as n — ^ OO 



/ 

J — ( 



and so fn 60 in distribution, even though fn^O pointwise. 

3. However, if the ordinary functions /„ are dominated by an integrable function g, and if 
fn~* f pointwise, then the dominated convergence theorem states that 

/ /n(x)(/j(x) dx / /(x)(/?(x) d-x. 

and thus /„—>■/ in distribution as well. 

□ 



15.5.3 Differentiation of Distributions 

If / : M — M is an ordinary function, and <p a test function, then integration by parts yields 

/oo fOO fOO 

f'{x)(t>{x) dx = f{x)4>{x)\^^ - / f{x)4>'{x) dx = - f{x)4>'{x) dx 
-00 J —00 J — 00 

because 4>{x) has compact support, and is therefore at ±00. (Remember that means 
lima_>_oo limft_>oo and note that ^(a) = (f){b) = for big enough a, b.) Thus we have 

(/',^) = -(/,</.') 

for ordinary functions / and test functions ^. This suggests that we define the derivative of 
a generalized function (distribution) / in the same way: /' is the map /' : ^{U) — M defined 

by 

/'(</.):= -/(^') i.e. :=-(/, <^') 

Note that if is a test function, then so is 0', so (/, cp') is well defined when / is a distribution. 
In the multidimensional case, we adopt the same definition: 

Definition 15.5.9 (Differentiation of Distributions) 

Let U C be an open set, and let / G V^U) be a distribution on U . The partial derivative 
of / w.r.t. Xj is defined as follows: ^ is that distribution whose action on test functions 
is given by 

□ 

Note that 

> dxjdxf; ' \ dxf; ' dxj ' ' dxjdx/^ ' 

SO the sign depends on the number of differentiations performed. 



The Heat Equation 



311 



If / is an ordinary differentiable function, we the integration-by-parts formula ensures 
that the distributional derivatives of / are the same as its ordinary derivatives. For example 
(taking the xi-coordinate for notational convenience), we have 

f df /■°° /•°° f df \ 

J dx^^^"^^^^^ ^ J " J \J -Q^{Xl^---^Xn)(l){xi,...,Xn) dxij dX2...dXn 

POO rco / Q(i) \ 

= J ■ J y- J f{xi,...,Xn)^{xi,...,Xn) dxij dX2...dXn 



J 



/(x)^(x)dx 



and so 

i.e. the ordinary partial derivative of / coincides with its distributional partial derivative. The 
operation of differentiation of distributions therefore extends the operation of differentiation 
of functions. Moreover, it is always defined, i.e. every distribution is differentiable, and the 
derivative of a distribution is another distribution. In particular, even for non-differentiable 
functions are differentiable, provided we allow the their derivatives to be distributions. 

Examples 15.5.10 1. Let's compute the (distributional) derivative of a non-differentiable 
function, namely the Heaviside function 



H{x) :-- 

We have 



if X > 
if a; < 



/OO f'OO 
H{x)(f)'{x) dx = - (j)'{x) dx = -((/)(oo) - (^(0)) = (j){0) 
-OO Jo 

because 4> has compact support. Hence 

{H', (f)) = (Jo, (j)) for all test functions ^ 

and so H' = 5q, i.e. the delta function is the distributional derivative of the Heaviside 
function. 

2. Let's compute the derivative of the delta function: 

(<5(„(/>(x) = -(<5o,0'(x)) = -<A'(O) 
So ^0 is that functional which assigns to each test function minus its derivative at zero. 

□ 

Note that distributional derivatives are very well-behaved w.r.t. convergence: If /n,/ are 
one-dimensional distributions, and if /n ^ / in distribution, then also — > /' in distribution. 
For if ^ is a test function, we have 



lim(/;,</)) = -lim(/„,,^') = -(/,</)') = (/',0) 

n n 



312 



The Heat Equation 



where we have hm„(/„, (f)') = (/, (f>') because fn^firi distribution and (/)' is a test function, 
hence 

{fn,(t>) {f\4>) for every test function (j) 

and so f in distribution. Exactly the same argument shows that in the n~dimensional 

case, 

Q Q f 

fn^f in distribution — > — — in distribution 

OXj OXj 

15.6 Green's Functions Revisited 

Since we can differentiate distributions, they may be considered as the objects in differential 
equations, i.e. in a PDE like ut - lAu = /, both u and / may be taken to be distributions. 
The PDE will always make sense for distributions, because they are always differentiable. 
Moreover, when u, f are restricted to be sufficiently smooth functions, the distributional and 
ordinary derivatives are the same thing. In this way, the set of possible solutions to a PDE 
is made bigger, and also nicer. We briefly discuss how fundamental solutions and Green's 
functions can be phrased much more naturally and intuitively in terms of delta functions. 
This is the beginning of a long and beautiful story. 

15.6.1 Laplace's Equation 

Consider the Laplacian operator A, operating on a distribution u. Since (|^,<?!>) = {u, 
for any test function (p, we see that Au is the distribution defined by 

(An,0) = {u,A(f>) 

Now we saw earlier that if is a test function, then 

(f,(0) = - f *(x)(/)(x) dx 

where ^ is the fundamental solution to Laplace's equation. Thus 

{5o,<j>) = m = -(^,A(^) = (-A* ,<^) 
i.e.the fundamental solution * is a solution to the PDE 



Distributions often provide the best way to think about fundamental solutions and Green's 
functions for constant coefficient PDEs. 

Consider the Dirichlet problem for the Poisson equation 

-Am = / in ^7 
u = on dU 

If G is the Green's function for the region U, then we know that the solution is given by 

u{xo)= / G(xo,x)/(x) dx 
7r" 



The Heat Equation 



313 



(The surface integral vanishes because of the homogeneous boundary conditions.) Thus we 
have (with xq fixed, and x the variables of differentiation) 

{S^„u) = ui^o) = (G(xo,x),/(x) = {G^,Au) = {AG^<^,u) 

from which it follows that, for fixed xq, the Green's function x i— *■ G(xo,x) is the solution of 
the EVP 

-AG = (5x0 in U 
G = ondU 

i.e. G is the unique distribution which solves this EVP. 
15.6.2 The Heat Equation 

You should already have seen that the fundamental solution $ of the heat equation can also 
be related to the delta function. We repeat the argument: We know that the function 



u{t,-xo) ■= / $(t, X - xo)5(x) dx 



is the (only nice) solution of the Cauchy problem 

ut — ^Au = 
«(0,x) = 5(x) 

and that 

u(x, t) —> 5'(x) as t — > O"*" 
Thus, with Xq fixed, it follows that 

($(t,xo-x),5) = u{t,-xo) ^5(xo) = (<5xo,5) 

for all g and thus all test functions. In particular, with xq = 0, and noting that $(i, x) 
$(i, — x), we have 

mt,yi),g) ^ {So,g) as t ^ 0+ 

and thus that 

($(t, x) — (5o(x) in distribution 
So the fundamental solution $ is the solution to the EVP 

$t - = in (0, oo) X 
$(0,x) = ,5o 



314 The Heat Equation 



Chapter 16 

The Radon— Nikodym Theorem 



16.1 Definitions and Statement of Radon— Nikodym Theorem 

We begin with some definitions: 

Definition 16.1.1 Let i/, /i be measures on a measurable space (S,S). 

(i) We say that v is absolutely continuous w.r.t. ji, and write <C /i, iff /lA = imphes 
i^A = for all AeS. 

(ii) fi, V are equivalent iff ^ /U and ^ ■^u. 

(iii) We say that ^, u are mutually singular, and write _L i/, iff there exists A ^ S such that 

= vA" = 0. 



RemEirks 16.1.2 (a) Two measures are equivalent iff they have the same null sets. 

(b) Two probability measures are equivalent iff they have the same null sets, iff they have 
the same sets of measure 1, iff they have the same sets of positive measure. 

(c) Two measures are mutually singular iff their "masses"' arc concentrated on disjoint sets: 
If ^lA = uA'^ = 0, then all the mass of fi lies in A*^, and all the mass of u lies in A. 



Examples 16.1.3 (a) Suppose that {S,S,iJ,) is a measure space, and that / G S'^. Recall 
from Propn. 6.4.1 that there is a measure v = f ■ jj, on. {S,S), defined by 



It is clear that <C /U. 

The map / is called the density, or Radon-Nikodym derivative, of v w.r.t. f^, and also 
denoted ^. 

The Radon-Nikodym Theorem (below) states that the above way of constructing an 
absolutely continuous measure is the only way to do so: If ^ /x, then u has a density, 
i.e. then i/ = f ■ fi for some non-negative measurable /. 

Also recall that if v = f ■ fi, then vg = n{fg) (cf. Propn. 6.4.2, the Chain Rule). 



□ 



□ 



i^A= / f dfi 

J A 



315 



316 



Radon-Nikodym Theorem 



(b) Clearly any point mass is singular w.r.t. Lebesgue measure on (M, H(M)), e.g. Sq -L A. 

□ 

Here is the main result of this section: 
Theorem 16.1.4 (Radon-Nikodym) 

Suppose that v,iji are a-finite measures on {S,S) and that <C /x. Then there exists a ji-a.e. 
unique f G mS'^ such that v = f ■ fj,. 

□ 

The next subsection is devoted to its proof, which involves a number of interesting related 
results. 

We have the following versions of the Chain Rule and Reciprocal Rule for densities: 

Proposition 16.1.5 (a) Suppose that i^,ri,iJ, are measures on a common measurable space, 
and that i/ <C 77 and r] <^ 11. Then also v <^ jj, and 

dv dv dr] 
dji drj dfj, ^ 

i-e. f -{g- fi) = if 9) ■ At 
(b) If u <^ n and > jjL-a.e., then also fi <^ u, and 

dfj, f \ 

dv \dfi) ^ 

i.e. if u = f ■ fi and / > 0, then /j, = j ■ u. 

□ 

Exercise 16.1.6 Prove Propn. 16.1.5. 

□ 

16.2 Proof of the Radon-Nikodym Theorem; Related Results 

The proof of the Radon-Nikodym Theorem involves a number of intermediate steps, all 
interesting in their own right: We will prove the Hahn-, Jordan- and Lebesgue Decomposition 
Theorems in succession. The Radon-Nikodym Theorem is a special case of the latter. 
First, we generalize the concept of measure: 

Definition 16.2.1 Let {S,S) be a measurable space. A signed measure is a countably addi- 
tive map a : S ^ [—00, 00] such that O!0 = 0. 
a is said to be finite if —00 < aA < +00 for all A £ S. 
A set P C S" is called positive for a iff aA > for every measurable A C. P. 
Negative sets are defined analogously. 



□ 



Applied Analysis 



317 



Remarks 16.2.2 (a) Instead of length, volume or mass, think of aA as the electric charge 
of the set A. 

(b) Note that is both positive and negative. 

(c) Note that a signed measure a on {S,S) cannot take on both the values ±00. Indeed, if 
A £ S, then the sum aA + aA'^ must be defined, and must be equal to aS. Sums of 
the form 00 — 00 and (—00) + (+00) are not allowed. Hence if there is some A for which 
aA = +00, then aS = +00, and if there is some B for which aB = —00, then aB = —00. 
Hence we cannot find measurable A, B such that aA = +00, aB = —00. 

□ 

Exercise 16.2.3 Show that the union of countably many positive (negative) sets is positive 
(negative) . 

□ 

Exercise 16.2.4 Suppose that o; is a signed measure on {S,S). Suppose that An G S, for 
n eN, and that either that 

(i) An T A, or 

(ii) An i A, and —00 < aAn < 00 for some n G N. 

Prove that aAn — > aA. 
[Hint: see Exercise 4.3.5.] 

□ 

Theorem 16.2.5 (Hahn Decomposition Theorem) 

Suppose that a is a signed measure on the measurable space {S, S) . Then there exists a positive 
measurable set S'^ and a negative measurable set S~ in S such that 

s+us- = s 5+ n 5- = 

Before we can prove this result, we need a definition and a lemma: 

Lemma 16.2.6 Let a be a signed measure on {S,S), and suppose A E S satisfies —00 < 
aA < 0. Then there is a negative set B E S such that B C. A and aB < aA. 

Proof: We construct a sequence Sn of non-negative (extended) reals and a sequence An of 
subsets of A as follows: Let 

5i := sup{aE : E e S,E C A} 

Then (5i > q0 = 0. Choose Ai £ S such that Ai C A and aAi > ^ A 1 (where we have taken 
the V with 1 because it's not yet obvious that 61 < 00). 
Given Ai, . . . , An-i, define 

n-l 

Sn := sup |a£; : E eS,E C A - [j ^^j 

fe=i 



318 



Radon-Nikodym Theorem 



(so that 5n>0), and then choose a measurable A — Ufe=i -^k such that a An > ^ A 1. 
Now put 

^oo ~ B = A vIqo 

n 

Note that aA^ = X]„q;^„ > (because the An are mutually disjoint), and a A = aA^o + 
aB > aB. 

It therefore remains to show that 5 is a negative set. First note that 5„ ^ 0: For 

aA = a^oo + aB aA is finite 

together imply that aA^o is finite, and since aA^o = (xAn, we must have aAn 0. Since 
< ^ A 1 < aAn, we must have (5„ — ^ as well. 

Now if i? is a measurable subset of B = A — A^o , then E' is a subset of each A — Ufc=i -^k ^ 
well, and hence aE < 6n, for all n G N. Thus aE < 0, proving that B is negative. 

H 

Proof of Hahn Decomposition Theorem: Since +oo, — oo cannot both be amongst the 
values of a, assume that — oo < aA < for every A e S. 

a = inf{aiV : N is negative} 

Then a < a0 = 0. Let Nn be a sequence of positive sets such that aiV„ J, a. The union of 
negative sets is negative, so we may assume that A^^ is increasing. Let S~ := IJn-^"- 
Exercise 16.2.3, S~ is a negative set. By Exercise 16.2.4, aS~ = lim aNn = a, and thus a is 

n— >cxD 

finite. 

Let S*"*" := S — S~ . It remains to check that is positive. But if ^ C 5"+ is a measurable 
set with a A < 0, then there is a negative set B C. A such that aB < a A < O.Now clearly 
S~ n B = $, and so 5" U S is a negative set with a{S~ U B) = aS~ + aB < aS~ = a, 
contradicting the definition of a. 

H 

An immediate corollary is the following: 
Theorem 16.2.7 (Jordan Decomposition Theorem) 

Every signed measure is the difference of two mutually singular (non-negative) measures, at 
least one of which is finite. 

Proof: Let 5*+, 5"" be the subsets of S given by the Hahn Decomposition Theorem. Define 
two maps : 5 — > [0, oo] by 

a+A = a{AnS+) a-A = -a{AnS-) 

Then clearly, a^,a^ are non-negative measures, and a = — a~ . Now at least one of 
aS'^, aS~ is finite, and hence at least one of a'^,a~ is a finite measure. 

H 



Applied Analysis 



319 



Theorem 16.2.8 (Lebesgue Decomposition) 

For a-finite measures jj,,!^ on a measurable space {S,S), there exist unique measures Vai^c 
on {S,S) such that Va ^ fJ^i^s 1^ o.nd p = Va + i^s- Moreover, I'a = f ' 1^ for some /i-a.e. 
unique f G mS'^. 

Before we can prove this theorem, we need two lemmas: 

Lemma 16.2.9 On a measurable space (5,5), suppose that we have two measures //, i', and a 
sequence of measurable functions fn € mS~^ such that fn-fJ-^^ (forn G Nj. Let f = sup„/„. 
Then also f • n < i'- 

Proof: Suppose that /, 5 G m<S+ with / • /x, g ■ n < u, and define h := f \/ g, A := {f > g}. 
li B eS, then 

{h ■ ij)B = hdn= f dfi+ gdfi<u{AnB) + n B) = uB 

Jb Jadb JA'^nB 

and hence h ■ iJ, < v. 

Now, given the sequence /„, define gn := /i V • • • V fn (for n G N). Then gn' 1^- ^ ^1 E^iid 
gn T /• By the MCT, gn • /J^ ] f • /J,, and hence f • jJ, < v also. 

H 

Lemma 16.2.10 Let n^u be finite (non-negative) measures on a measurable space {S,S), 
such that v JL jj,. Then there exists f G mS'^ such that /x/ > and f ■ n <i'. 

Proof: For n G N, define a signed measure 7^ := — n~^ii, and let be two sets given by 
the Hahn Decomposition Theorem applied to 7„. Since 71 < 72 < ■ • • , we may assume that 
Si ^ S2 ■ ■ ■ , because if n < m, then 5+ is positive for 7^. Let A := 5+, and so that 
A" = fl^ 5- . Then for each n G N, 

QKvA" < vSn = lnS~ + n'^/iSn < fiS 

because 771^^ < 0. Now n~^iiS ^ as n ^ 00, and hence uA"^ = 0. Since ji ^ M^e must 
have jjlA > 0. It follows that > for some n, because 5+ j A. Define / := n~^/^+, so 
that nf>0. If 5 G S, then 

(/ • fi)B = n-^fi{S+ nB) = u{S+ nB)- 7„(5+ nB)<uB 

because is positive for 7^, and hence / • M < i/. 

H 

Proof of the Lebesgue Decomposition Theorem: First assume that fi, u are finite 
measures. Let 

C :={/ G m5+ :/•//< z.} c := sup{A./ : / G C} 

Choose fn £ C so that /ifn c, and let / := sup„ /„. As in Lemma 16.2.9, we may assume 
fn T /) so that nfn T m/ (by MCT), and hence vf = c. Lemma 16.2.9 implies that f & C. 



320 



Radon-Nikodym Theorem 



Define Va '■= f ■ fJ', and, perforce, Ug := v — Va- It is clear that Va ^ so we must check that 
i/g -L fi. 

Suppose not. Invoke Lemma 16.2.10 to obtain a measurable g > with jig > Q and 
g ■ H < Vs- Then f + g eC, because {f + g) ■ n = Va + g ■ < i^a + ^s, and + g) > c — 
contradiction. Hence _L /i. 

We now have a decomposition z/ = + zv^. To prove that it is unique, assume that we 
have another decomposition u = i>a + i>s, with i>a ^ /U, i>s -L /i. Choose A,AeS such that 

UsA = fiA'^ = V'sA = jj.A'^ = 

and let B = Ar\A. Then clearly Vs{B) = 0. Now < niA" U A") < nA" + fiA" = 0, so that 
also fa(-B"^) = 0. It follows that, for C G <S, we have UaC = UaiB n C), and thus 

(7s • i^)C = u{B nC) = Ua{B n C) = UaC 

We have therefore shown that 

i'a = Ib-^ 

Similarly i>s{B) = 0, PaiB'^) = imply that i'a = Ib ■ Hence = v^, and Vg = v — Va = 

Next, we must check that the / in !/„:=/• /Lt is unique /x-a.e. Suppose that also Va = g- 
for some g G m^"*", and let h = f — g. Then h ■ fi = 0, and hence 

= / h d/j, ~ h dfi = 

J{h>o} J{h<o} 

so that h = /i-a.c. by Lemma 6.3.8. 

It remains to drop the assumption that fi, v arc finite measures. If they are merely cj~finite, 
we can choose a sequence of mutually disjoint measurable sets A^ with union (J^ An = S such 
that iiAn, vAn < oo for all n G N. Applying the above to the measurable space {An, S fl An), 
we are able to find a /v, a.c. unique S Ci A^-measurable /„ : An M and a unique such 
that = fn ■ fJ- + J^s and z/" _L on the measurable space {An,S n An). Define / : 5 — > M by 
gluing the /„'s: 

/(s) := /„(s) iff seAn 
It is easy to see that / G m5+. Define Ug '■= — f ■ IJ-, and note that if C C then 

I.,C = l/C - (/ • fi)C = vC- ifn ■ fl)C = V^gC 

To see that z/^ _L on [S, S) is now straightforward: Choose B„ C ^„ are such that v^Bn = 
0,IJ.{An — Bn) = (which exist because z^" _L on {An,Sn)), and let B := IJn-^" Then 

l^sB = Y,^sBn = Y,iy:Bn = 
n n 

and, using the disjointness of the An and the fact that Bn ^ An, 

fiB^ = fi(^\J{An - Bn)) = ^fi{An - i?„) = 
n n 

H 



Mesisure and Probability 



321 



As a corollary, we have 
Theorem 16.2.11 (Radon-Nikodym) 

Suppose that v, fi are a -finite measures on {S,S) and that v <^ ji. Then there exists a fi-a.e. 
unique f G niS~^ such that v = f ■ fx. 

□ 

16.3 Products 
16.3.1 Introduction 

Example 16.3.1 (a) Denote by /xA the area of a subset A of M^. We know how to define ^ 
on rectangles, i.e. sets of the form A = Bi x B2, where Bi,B2 are intervals in M: Indeed 

liA = XBi X \B2 (*) 

where A is Lebesgue measure. So is to be a measure on (R^,;B(M^)) such that ^{Bi x 
B2) = \{Bi)\{B2). Of course, many sets in i3(M^) do not have the form Bi x B2, and we 
would like to be defined for them as well. So (*) cannot serve as a definition of jJL. 

(b) In probability theory, it is quite natural to consider the product of two probability spaces. 
Such products typically model sequences of independent experiments. For example, let 
Vli = {H,T},J^x = and let Pi{i?} = I = Pi{r}. Then {ni,Ti,Fi) models the 

tossing of a fair coin. Now let ^2 = {1,2,..., 6},T2 = V{VL2) and P2{1} = P2{2} = • • • = 
P2{6} = g. Then {^2^^2^^2) models the rolling of a fair die. The underlying set of the 
probability space which models the combined random experiment "First toss a fair coin, 
and then roll a fair die" can clearly be taken to be the cartesian product O = Hi x f22. The 
natural cr-algebra will he J- = V{^i x ^2)-, and it is not hard to see that this tr-algebra 
is generated by the vr-system {BiX B2 : Bi £ Ti,B2 G .^2}- Now the event _Bi x ^2 ^ $7 
consists of all those outcomes uj = {0)1,102) G x $^2 having cji G Bi and uj2 G B2. Thus 
Bi X B2 occurs in the combined random experiment iff Bi and B2 occur in each of the 
individual experiments. 

The probability measure associated with the combined random experiment would there- 
fore naturally satisfy 

¥{Bi X B2) = ¥i{Bi)¥2{B2) (**) 

But not every event in x $^2) is of the form Bi x B2, so (**) cannot serve as a 

definition of P. 

□ 

The aim of this section is to construct, out of two measure spaces {S,S,ij),{T,T,v) an 
new measure space {SxT,S®T,ix®v) satisfying the following requirements: 

(i) A subset of 5 x T is called a measurable rectangle if it has the form A x B, where 

^ G 5,5 G r. 

S ®T IS defined to be the smallest cr-algebra on S x T which has all rectangles with 
measurable sides as members. 



322 



Products 



(ii) For each rectangle Ax B, we require that (/x iS) f){A x B) = fiA ■ vB 

Remsirks 16.3.2 (a) A remark on notation: We will be working with functions of more than 
one variable, and may integrate with respect to just one of those variables. We therefore 
introduce the following notation: 

j f{x) ii{dx) := iif =: ii-'fix) 

Thus, for example, J /(a;, y) iJ,{dx) integrates the function f{x, y) over x, keeping y fixed. 
The integral // f{x,y) fi{dx) ^{dy) is a double integral that first integrates / w.r.t. jjL 
over the variable x, and then integrates the function y i— > / f{x,y) fj,{dx) w.r.t ly over the 
variable y. We may also write this as (fi^ f (x , y)) . 

(b) Several times below, we will prove a result for finite measures, and then refer to a "stan- 
dard argument" to lift the result to a-finite measures. This is done as follows: Suppose 
that fi is (T-finite on (5,5), and that a result $ has been proved to hold for finite mea- 
sures. Since is a-finite, there exists a sequence of measurable sets An ] S such that 
jiAn < oo for all n G N. The measures /i„ := Ia„ • n are finite on {S,S), so that result 11 
holds for the By the MCT, if / G m5+, then 

= = lim/i„/ 

n n 

This is often enough to show that $ holds for jj, as well. 

□ 

16.3.2 Products of Measure Spaces 

Given two measurable spaces {S, S), (T, T), there we can construct a cr-algebra S <SiT on the 
cartesian product S x T: 

Definition 16.3.3 Let {S,S) and {T,T) be measurable spaces. Define projections tts ■ S x 
T ^ S.-KT ■■ S xT ^Thj 

ITS : (s, t) 1-^ s ttt '■ (s, t) 1-^ t 

Then define S ®T := a{'Ks, t^t) to be the smallest a-algebra for which both projections are 
measurable. 

□ 

Exercise 16.3.4 Let {S,S) and {T,T) be measurable spaces, and let TZ := {A x B : A e 
S,B e T} be the set of all measurable rectangles. Note that 7?. is a 7r-system. Show that 

S(g)T = a{n). 

Hence the product a-algebra is generated by the 7r-system of all measurable rectangles. 
[Hint: A X B = {A X T) n {S X B), and A X T = TTg^lA].] 



□ 



Mesisure and Probability 



323 



Exercise 16.3.5 Show that B{R'^) = B{R) B{R). 

[Hint: Using Exercise 16.3.4, it is easy to see that ^(M^) D B{R) B{R). For the opposite 
direction, show that any open set in can be written as a countable union of sets of the 
form U xV, where U, V are open intervals in M.] 

□ 

Suppose that (S, S, fi) and (T, T, are measure spaces. We would like to construct a 
measure (8> on (5 x T, 5 T). One way that suggests itself is to define 

(1) {ii®u)B ■= j (^j lB{s,t)u{dt)) ix{ds) = ii'{uHB{s,t)) BeS^T 
Another is to define it as 

(2) {n®u)B := j (^j lB{s,t) ii{ds)^ v{dt) = u\ii'lB{s,t)) BeS<^T 

Exercise 16.3.6 Check that 

{n^v){Ax B) = iiA-uB AeS,BeT 
for both of the above possible definitions of /x (g) z/. 

□ 

We shall soon see that (i) the above definitions are both possible, and (ii) they coincide. 

We first investigate the possibility of defining /x (g) i/ in the above manner. To be able to 
do perform a double integral J J f{s,t) u{dt) iJ,{ds) it is necessary that: 

(i) for each s e S, the map t i— > /(s, t) must T-measurable, so that we can calculate the 
inner integral J f{s,t) v{dt)\ 

(ii) the map s ^ F{s) := J f{s,t) ^{dt) must be <S-measurable, so that we can calculate 
the outer integral J F{s) fi{ds). 

The following lemma gives us what we need: 

Lemma 16.3.7 Suppose that {S,S) and {T,T) are measurable spaces, that fi is a a-finite 
measure on {S,S), and that f : S xT ^ is S ® T-measurable. Then 

(i) For each t gT, the map s ^ f{s,t) is S m,easurable. 

(ii) The map 1 1-^ J f{s,t) /x(ds) is T-measurable. 

Proof: Wc apply the Monotone Class Theorem (Thm. 8.1.5. First assume that /x is a finite 

measure, and let 

H = {f E mS (g) T : / is bounded and satisfies (i) and (ii)} 

It is easy to verify that 7^ is a vector space (we need the finiteness of /x in order to avoid 

expressions of the form oo — oo), and that that each IaxB € 'H, where A E S,B € T. By 
the MCT, 7i is closed under bounded limits of increasing non-negative sequences. Moreover, 
the set TZ := {A x B : A & S,B G T} is a 7r-system with the property that Ir & 7i for 



324 



Products 



every R £ TZ, and thus by Thm. 8.1.5 every bounded S T-measurable function belongs 
to TC (since cr{TZ) = S <Si T). Now each non-negative measurable function / is the limit of 
bounded non-negative measurable functions (/ = lim„ /An), and thus another application 
of the MCT shows that every / G m.{S (g) T)+ satisfies (i) and (ii). 

Now drop the assumption that is a finite measure. Because /U is tr-finite, we can 
choose An t S such that fiAn < oo. The measures /x„ = Ia„ • 1^ are finite measures, and 
thus each map t J f{s,t)iJ,n{ds) is T-measurable (where / > 0). Since / f{s,t) iJ,{ds) = 
lim„ J f{s,t) iJ,n{ds), the MCT implies that the result holds for /x. 

H 

We now know that it is possible to define (8> in the ways indicated. What we don't 
(yet) know is that these constructions define a measure, and that they coincide. 
For definiteness, we fix one of the above definitions: 

Definition 16.3.8 Suppose that (5, and {T,T,u) are cr-finite measure spaces. Define 
a map iJ,®v:S®T^ by 

(/X (8) v)B := JJ Ib{s, t) v{dt) iJ,{ds) = ii'{vHb{s, t)) B eS^T 

fj,<Si I' is called the product measure of /i, v. 

□ 

Exercise 16.3.9 Show that /x (8) defines a cr-finite measure on {S x T,S 'Si T). 

□ 



The next two results show that (modulo certain conditions) we can calculate the integral 
w.r.t. /X (8 as a double integral, and the order of integration doesn't matter: 

J fd{fiSu) = J J f{s,t) u{dt) fi{ds) = jj f{s,t) fi{ds) v{dt) 

We first show this for non-negative measurable functions: 
Theorem 16.3.10 (ToncUi) 

Suppose that {S,S,iJ,) and (T^T^v) are a-finite measure spaces. If f & m{S0T)~^, then 

{liSu)f = ti'{u'f{s, t)) = u\li'f{s, t)) (*) 

Proof: We use the Monotone Class Theorem (Thm. 8.1.5). First assume that /x, v are finite 
measures. The result is obvious if / = IaxB, where A x B measurable rectangle, (or cf. 
Exercise 16.3.6). The class 

H = {f & m{S (g) T) : / is bounded and satisfies (*)} 

is easily seen to satisfy the requirements of Thm. 8.1.3 , and thus implies that H contains 
every bounded S (8 T-measurable function. The result for arbitrary non-negative / follows 
by MCT. 

A standard argument lifts the result to the case where /x, v are merely a-finite. 



Mesisure and Probability 



325 



H 

As a by-product, we obtain the result that our two possible definitions of as iterated 
integrals coincide: If B E S <SiT, then Ib is a non-negative measurable function, and we may 
apply Tonelli's Thm. 

For non-negative functions /, the integral fif always makes sense, but we may have 
/x/ = oo. For arbitrary measurable /, we have to be more careful. 

Theorem 16.3.11 (Fubini) 

Suppose that {S,S, n) and {T,T ,u) are a -finite measure spaces. If f E C^{SxT,S<SiT, fj,<Sii'), 
then 

iti® u)f = fi%u'f{s,t)) = f{s,t)) 

Here the map t ^ fi'^f{s,t) belongs to C^{T,T,i') for v-SL.e. t E T. Similarly, the map 
s <—>■ v^f{s,t) belongs to C^{S,S,iJ,) for /x-a.e. s & S. 

Proof: The result holds for |/|, by Tonelli's Thm., and hence Ns = {s G S : i^*|/(s, t)\ = +00} 
is /7,-null, and Nt = {t G T : fi^\f(s,t)\ = +00} is z^-nuU. Redefine f{s,t) to be zero when 
either s G Ns or i G Nt] this won't affect the integral of /, by Thm. 6.3.9. The result follows 
by splitting / into positive and negative parts. 

H 



Remarks 16.3.12 (a) Fubini's Theorem allows the interchange of the order of integration, 
provided the integrand is integrable w.tr.t the product measure. It follows from Fubini's 
Theorem that 

/(//d.)<i. = /(//d.).. 

provided that / G JC^. See Exercise 16.3.13 for what can happen if / JC^. 

(b) Fubini's Theorem also easily extends to arbitrary finite products: If {Si, Si, fii) are cr-finite 
measure spaces for i = 1 , . . . , n, then 

(i) (Si (g) • • • (g) (S„ is the cr-algebra on Si x ■■■ x Sn which is generated by the projections 

TTi : Si X • • • X Sn Si : (si, . . . , s„) ^ Si. It is also generated by the family of 
measurable "rectangles" TZ = {Ai x • • • x An : Ai G iSj for i = 1, . . . , n}. 

(ii) Hi <Si • • • Hn '^s the unique measure on Si • • • Sn which assigns to every rectangle 
the measure 

(/Xl ig) . . . X fJ,n){Al X ■■■ X An) = /Ltl^l fJ-nAn 

(iii) Fubini's Theorem states that if / : x • • • x S'n — > M is /xi (g) • • • (g) /x„-integrable, 
then 

/ / <^ • • • ® /Ltn) = / I / • • • I / f dlln] ■ ■ ■ dlJ.2] dm 

JSlX-xSn J Si \JS2 \JSn / / 

and that any interchange of the order of integration is permissible. 

□ 



326 



Products 



Exercise 16.3.13 Let 

x'^ — 



f{x,y) = 
Show that 

/' /' fix, y) X{dy) \{dx) = - f f fix, y) \{dx) \{dy) 
Jo Jo ^ Jo Jo 

What can you conclude about 

/ / ci(A ® A) 



[0,1] X [0,1] 

□ 



Appendix A 

Convergence in R 



Some of the notions described in this chapter should be thoroughly familiar already. We 
include them here as preparation for the related, but more demanding, notions that we will 
encounter in the future. 

A.l Definition of Convergence 

Let P be a property that a real number may (or may not) have. We write P{x) if x has the 
property P. [For example, P could be the property of being positive, so that P(1.23), but 
-iP(— tt). Or Q could be the property of being irrational, in which case -^Q{1.23), whereas 
Q{—tt).] Now suppose that is a sequence in M, and that P is a property: 

• We say {xn)n has property P infinitely often iff there are infinitely many n for such that 
P{xn) is true. 

• We say {xn)n has property Peventually iff P{xn) is true for all n from some point 
onwards. 

This can be made precise: 

{xn)n has property P infinitely often <^=^ Vn 3m > n P{xm) 
{xn)n lias property P eventually <^=^ 3n Vm > n P{xm) 

Exercise A. 1.1 Show that ^(Va;P(a;)) <S=^ 3x{-iP{x)) and that -.(3a;P(a;)) <^ Va;(-.P(a;)). 
Conclude that -i{P{x„),i.o.) -<=^ (-iP(xn),ev.). 

□ 

We first recall what it means for a sequence {xn)n of non-negative real numbers to converge 
to zero: 



To say that Xn ^ means that is "small" eventually. 



327 



328 



Definition of Convergence 



The notion "small" is subjective, so we will demand that it holds for absolutely anybody's 
idea of "small". Specifically, suppose you define "small" by specifying some number £ > 
and saying non-negative number x is small iff x < e". To say that {xn)n is eventually 
small then means that from some point onwards all the small, i.e 

Vra > iV [xn < e] 

This must be true no matter what gauge £ > of "smallness" you use. Thus: 

If {xn)n is a sequence of non-negative real numbers, we say 

Xn^O <^ V£ > 3iVVn > iV [Xn < s] 

Thus — > iff given any £ > it is possible to find a natural number N such that 

Xn < s whenever n> N 

The number N typically depends on e. The smaller £ > 0, the greater usually has to be. 

It is now simple to define convergence of arbitrary sequences in M: To say that Xn x 
means that the distance between x„ and x converges to 0, i.e. 

Now the distance |x„ — x\ between Xn and x is non-negative, so we already know what 
\xn — a;| — means: It means V£ > 3N Vn > N[\xn — x\ < e]. Thus 

If {xn)n is a sequence of real numbers , we say 

Xn^ X V£ > 3iV Vn > - x\ < e] 

We also write 

X = lim Xn for Xn ^ x 

n 

Thus Xn — > X iff given any £ > it is possible to find a natural number A'' such that 

\xn — x| < £ whenever n> N 

The number N typically depends on e. The smaller £ > 0, the greater N usually has to be. 
We also sometimes say that a sequence converges to ±00. 

To say that x^ — > 00 means {xn)n is "large" eventually. 



Convergence in M 



329 



The notion "large" is subjective, so we will demand that it holds for absolutely anybody's 
idea of "large". Specifically, suppose you define "large" by specifying some number K > 
and saying "A number x is large iffx> K" . To say that is eventually large then means 

that from some point onwards all the large, i.e 

BAT Vn > AT [x„ > K] 
This must be true no matter what gauge > of "largeness" you use. Thus: 



If {xn)n is a sequence of real numbers, we say 

x„ ^ oo >03Nyn> N [xn > K] 

We say that .x„ ^ — cx) iff — .t„ ^ +oo. 

When Xn ±oo, then lim„ x„ does not exist (as a real number). We then say that lim„ x. 
exists in the extended sense. 

The next theorem is left as an exercise: 



Theorem A. 1.2 (Sandwich Theorem) 

Suppose that {xn)n, {yn)n o-nd {zn)n 0-1"^ Sequences in M which satisfy the following condi- 
tions: 

(i) Xn < Vn ^ Zn for all u & N (or merely eventually^; 
(a) There is I eM. such that x„ — > I and Zn — /. 
Then also yn ^ I- 



Exercise A. 1.3 The aim of this exercise is to prove the Sandwich Theorem. So let e > 0. We must show 
that there is Ai' £ N such that \yn — l\ < e whenever n > N, or equivalently, that I — e<yn<y + £ whenever 
n> N. 

(a) Assume first that «„<?/„< Zn for all n £ N. Explain why there is iVi e N such that whenever n > Ni, 
we have I — e<Xn<l + s- 

(b) Now explain why there is an iV e N such that whenever n > N, we have both I — e<Xn<l + £ and 

I — S < Zn < I + £■ 

(c) Now explain why also l~e<yn<l + e whenever n > N. 

(d) The Theorem has now been proved for the case where Xn < yn < Zn for all n € N. Modify your proof 
slightly to show that the Theorem remains true if we have a;„ < j/„ < Zn eventually. 

□ 

You should also be familar with the following theorem, whose proof we leave as an exercise. 
(It can be found in any introductory text on Real Analysis.) 



330 



The Completeness Axiom 



Theorem A. 1.4 Let {xn)n,{yn)n 
a G M. Then 


be sequences in W, with Xn x, yn ^ y- Also let 


(a) [Xn + yn) x + y; 




(h) aXn — ^ ax; 




(c) XnVn xy; 




(d) ^ ^ I (provided x„ / for n 


€ N, and x^^Q); 


(e) 1^ ^ f (provided yn ^ for n 


eN, andy^O). 



A. 2 The Completeness Axiom 

Much of the power of real analysis comes from the ability to construct objects as certain 
limits. That these limits exist is a consequence of the completeness axiom. In order to state 
this axiom, we need some preliminary definitions: 

Definition A. 2.1 Let ACR. 

(a) We say that u G M is an upper bound for A iff every element of ^ is < u (i.e. iff 
Va G A[a < u]). 

(b) We say that A C M is bounded above iff A has a (finite) upper bound. 

(c) Similarly, we say that Z G M is a lower bound of A C R iff Va G A(l < a). If A has 
a (finite) lower bound, it is bounded below. A set which is both bounded above and 
-below is said to be bounded. 

(d) We say that uq is the supremum (or least upper bound) of C M iff 

(i) Uq is an upper bound of A, and 

(ii) If u is an upper bound of A, then uq <u (i.e. uq is an upper bound which is < 
any other upper bound) . 

We write this as 

Uq = sup A 

(e) Similarly, we say that Zq is the infimum (or greatest bound) of A C M iff 

(i) lo is a lower bound of A, and 

(ii) If I is an upper bound of A, then Iq > I (i.e. Iq is a lower bound which is > any 
other lower bound). 

We write this as 

lo = inf A 

Remarks A. 2. 2 1. Upper bounds, if they exist, are not unique: If u is an upper bound of 
A and v > u, then v is also an upper bound of A. 



Convergence in M 



331 



2. The notions of sup, inf generalize the notions of max,min: If x = max^, then x = sup^. 
However, a set ^4 C M may have a supremum without having a maximum. 

3. The following statements are obvious, but often useful: 

x<supA 3a e A[x < a] 

X > inf ^4 <;=^ 3a e A[x > a] 

4. If A has no (finite) upper bound, we may write sup A = oo. 

5. If ^4 C M, then sup(-A) = - inf A (where -A := {-a : a e A}). 

□ 

Here is the fundamental axiom of analysis: 



Completeness Axiom Every non-empty subset of M which has an upper bound has a 
least upper bound, i.e. if AC.M. is bounded above then sup A exists. 



Using Remarks A. 2. 2. 5, it is easy to see that every subset of M which has a lower bound has 
a greatest lower bound. 

RemEirks A. 2. 3 1. Note that the completeness axiom fails for Q: It is not true that every 
bounded set of rational numbers has a rational least upper bound. To see this, take 

A:= {x eQ: x^ <2} 

This set does have supremum in M,namely sup A = \/2. However, \/2 is irrational. 

2. Why should we believe this axiom to be true for M? Here is a thought experiment that 
provides some intuition: Suppose that ^ is a set of non-negative reals. For each a £ A, 
take a line segment of length a. Stack all these line segments on top of one another, on a 
blank page, with bottom points aligned. Since the line segments have zero thickness, all 
you will see is a single line segment. This "line segment" ought to have a length, and a 
little thought will convince you that this length is sup A. 

3. If ^4 is non-empty, we obviously have inf A < sup A. Yet 

inf = oo sup = — oo 

(Explain!) 

□ 

The following theorem is basic: Recall that a sequence of real numbers is increasing if 

and only if xi < a;2 < X3 < . . . , i.e. iff m < n ^ Xm < a^n- 



332 



The Completeness Axiom 



Theorem A. 2. 4 If (xn) is an increasing sequence of real numbers which is bounded above, 
then {xn)n converges, and 



lim Xn = sup Xn 
n „ 



Proof: Because {xn : n G N} is bounded above x := sup{x„ : n G N} exists. We show 

Xfi ^ x. 

Let e > 0. Then x — £ < sup{x„ : n G N}. Hence there is A'' G N such that x — e < xm 
(cf. Remarks A. 2. 2. 3). If n > AT then Xn > xn, and hence x — e < Xn < x. (All the Xn are 
< X, because x is an upper bound of the It follows that — a;| = x — Xn < £ whenever 
n>N. 

H 

By Remarks A. 2. 2. 5 it follows that every decreasing sequence which is bounded 

below has a limit, and that this limit is inf„x„. 

RertiEirks A. 2. 5 If {xn)n is an increasing sequence which converges to x, we often write 

Xn 1 X or a;=tlima:;„ 

n 

instead of Xn ^ X. Similarly, if {xn}n 

is a decreasing convergent sequence, we write 

Xn i X or X =i lim Xn 

n 

□ 

Note that the preceding theorem guarantees that a bounded monotone sequence has a 
limit, even if we have no way of directly assessing what that limit might actually be. This is 
the case in the following example: 

Example A. 2. 6 Define Xn — {1 + ^)"- Wo show that {xn)n converges. By the preceding theorem, it 
suffices to show (i) that {xn)n is increasing, and (ii) that it is bounded. Now by the Binomial Theorem 

^ n 1 n(n-l) 1 n(n - 1)...2.1 1 



In 2! n2 n\ 



2! \ nJ n\\ nJ\ nJ \ n J 

Similarly, 



2! V n + 1 / (n + 1)! V n + 1 / V n + 1 / " ' V n + 1 

Now we compare terms. Xn has n + 1 terms, whereas, Xn+i has n + 2 terms, all non-negative. Xn and Xn+i 
agree on the first two terms. Now if 2 < fc < n + 1, then the k^^ term of Xn is 



{k- 



whereas the fc**" term of x„+i is 



n+l)\ n+l) V n+l) 



(k-l) 

It is therefore clear that the fc**" term of x„ is less than the k*^ term of Xn+i- Moreover, x„+i has one more 
term, which is strictly positive. It follows that x„ < Xn+i- 
Thus {x„)„ is an increasing sequence. 



Convergence in M 



333 



' 'i~'-)(i~'-)...(i-'^)<^ 



Next, we show that (xn)n is bounded. Look again at the k*^ term of a;„: We have 

This is (i), because 
and (ii), because 



{k — 1)! V nJ \ nJ 



2*="^ = 2- 2 2<2-3 (fc-l) 



It follows that 



n) ri) '^ n ) 



x„<l + l+^ + ^+--- + ^<3 



and thus that Xn < 3 for all n. Hence (a;„)„ is bounded. 

We can now conclude that {xn}n converges, though wc do not yet know precisely where it converges to. 

If you stick the Xn into a calculator, you will see that x-„ —> e, where e = 2.7182818 ... is the base of the 
natural logarithm. 

□ 



A. 3 limsup and liminf; Subsequences 

Suppose that {xn)n is a bounded sequence in M. Construct two new sequences as follows: 

Un = sup{a;^ : m > n} Zn = mf{a;^ : m > n} 

Because (xn) is bounded, yn and Zn exist (i.e. are finite real numbers), by the Completeness 
Axiom. 

(—1)" 

Suppose, for example, that Xn = J for n > 1. Then 

yi = sup{-l,^.-^,|,-|,...} = ^ 
2/2 = sup{i,-i,i,-i,...} = 1 
y3 = sup{-|,i,-^,...} = I 
2/4 = sup{|,-^,...}= I 

i.e. {yn) is the sequence ^, ^, i, i, |, |, . . . . 

Similarly, you can check that {zn) is the sequence — 1,^,— g,— g,— g,— g 

Exercise A. 3.1 (a) Given {x„)n, write down the first 6 terms of y„ = sup{a;m : m > n} and Zn = 

inf{xm : m > n}. 



(i) 


Xfi — 


( 


-1)" 








(ii) 


Xn — 


1 

n 
















1 


if n 




(iii) 










is 


Xn — 


i 




n 














2-n 


if n 


is 










1 


if n 




(iv) 






f 1- 




is 


Xn ~ 


1 




n 












[ -1 + 2"" 


if n 


is 



odd 



(b) Note that each of the above sequences is decreasing, and that each of the («„)n is increasing. Can 
you explain why? 

(c) Finally, since the (yn) and (z^) are bounded monotone sequences, they must converge (by Theorem A.2.4). 
Write down lim„ y„ and lim„ Zn for each of the sequences in (a)(i)-(iv). 



334 



limsup and liminf; Subsequences 



□ 

As noted in the above exercise, (?/„) is a decreasing sequence, and {zn) is increasing. To 
see this, let = {xjyi : m>n}. Clearly 

^1 D ^2 D A3 D . . . 

Hence 

sup Ai > sup A2 > sup A3 > . . . and inf Ai < inf A2 < inf A3 < . . . 

(Note that if A C i?, then sup A < sup B, and inf A > inf 5.) 
Since y„ = sup A„ and Zn = inf A„, we see that 

yi > 2/2 > > ■ • • and zi < Z2 < Z3 < . . . 



Now any bounded monotone sequence converges (Theorem A. 2. 4), and thus lim„y„ and 
lim„ 2:„ exist if is bounded. We now define limsup„a;„ = lim„y„, and liminf„a;„ = 
lim„2:„: 



Definition A. 3. 2 Let (xn) be a sequence in M. We define the limit superior of by 


lim sup Xn = lim sup Xm 








where we adopt the convention that if is not bounded above, we set lim sup„ Xn 


= +00. 


Similarly, we define the limit inferior of (x^) by 




lim inf = lim inf Xm 




n n— >oo m>n 




where we adopt the convention that if is not bounded below, we set lim inf„ Xn 


= —00. 



Because is decreasing, we have limsup„x„ =J, lim„?/„ = inf„j/„ (by Theorem A. 2.4). 
Hence (and similarly) 



lim sup Xn = inf sup Xm lim inf x„ = sup inf Xm 

n " m>n n rn>n 

The notions of lim sup and lim inf are often regarded as quite difficult, so we will try to 
improve our understanding of it. Note that since 

lim inf x„ = sup inf Xm = — inf sup{—Xm) = — limsup(— x„) 

n „ m>n n ni>n n 

(because — sup A = inf (—A)), we need to prove a result only for limsup in order to obtain 
immediately a corresponding result for lim inf. 



Convergence in M 



335 



Proposition A.3.3 // {xn) is bounded above, then 
(a) 

X < lim sup Xn X < Xn infinitely often 

n 

(h) 

X < Xn infinitely often x < lim sup Xn 

n 

(c) 

X > lim sup Xn =^ X > Xn eventually 

n 

(d) 

X > Xn eventually x > lim sup x„ 

n 

(e) 

lim sup Xn, = sup{x : Xn > X infinitely often} = inf{x : Xn < x eventually} 

n 

The proof is an exercise: 
Exercise A. 3. 4 We prove Propn. A.3.3. 

Define yn ~ sup{a::m : "m > n}, and let x :— limsup^ Xn =J, Hm„ y„. 

(a) Suppose that x < x. Show that y„ > x for ah n. Conclude that Vn3m > n{xm > x). 

(b) Suppose that x < Xn i.o.. Explain why j/„ > a; for all n. Conclude that x > x. 

(c) Use (b) and some logic: x > x implies x % x, so x ^ Xn i-o. and hence x > x„ ev.. 

(d) Use (a) and logic. 

(e) Let A := {x : Xn > X i.o.}. Use (b) to show supj4 < x. Use (a) to show sup^ xhy considering an x 
such that sup^ <x < x. 

Proposition A.3.5 Suppose that {xn)n, {Vnjn o,re bounded sequences in W, and that A G 
M. 

(a) liminf„a;„ < limsup„a;„ 

(b) If X>0, then limsup„ Ax„ = Alimsup„2:„, and liminf„ Aa;„ = Aliminf„x„ 

(c) // A < 0, then lim sup„ Xxn = A lim inf„ Xn, and lim inf„ Xxn = A lim sup„ Xn 

( d) lim sup„(xn + yn) < lim sup„ x„ + lim sup„ y„ 

( e) lim inf ( Xn + Vn) > lim mfn Xn + lim inf„ ?/„ 

(f) If Xn < yn, then limsup„a:„ < limsup„yn and liminf„x„ < liminf„y„ 

Proof: Here's a proof of (a): Suppose that z < lim inf then Xn > x eventually, so also 
Xn > X infinitely often. It follows that z < limsup„a;„. Hence-*^ limsup„a;„ > z whenever 
z < liminf^Xn, and thus limsupjjXn > liminf^Xn as well. 
The rest of this proposition is left as an exercise. 

^ What we are using here is that, if a > x whenever x < b, then also a> b. For suppose that a < b. Choose 
X such that a < x < b. Then x < b, so a > x — contradiction, since a < x. Hence a ft b, i.e. a >b. 



336 



limsup and liminf; Subsequences 



H 

Exercise A. 3. 6 Prove the remainder of Proposition A. 3. 5. 

[Hints: (c) If Xn > z infinitely often and A < 0, then Xxn < \z infinitely often. 

(d) \i z > lim sup„ Xn and w > lim sup„ i/„ , then Xn < z eventually and yn < w eventually. Hence Xn + y-n < 
z + w eventually.] 

□ 

We are still not done with limsup and lim inf. It is also possible to find a characterization 
in terms of subsequences. Roughly speaking, if you write down all the terms of a sequence 
{xn)n, and then delete some of these terms, what remains is a subsequence. (However, you 

can't delete so many terms that there are only finitely many left.) 

This is best understood by looking at some examples: The sequence 2, 3, 5, 7, 11, . . . of primes 
is a subsequence of the sequence 1, 2, 3, 4, . . . of natural numbers: 

/,2,3, A5,^,7, A/0,/0,11,... 

In the subsequence, the order of elements remains the same as what it was in the original: 2 

comes before 3 comes before 5. . .etc. in both sequences. 

The sequence 3,2,6,5,9,8,... is not a subsequence of 1, 2, 3, 4, , 5, . . . . Not only have we 
deleted all numbers of the form 3n — 2, we have also rearranged them so that 3n is before 
3n — 1. In the sequence of natural numbers, 2 is before 3, but in this new sequence, 3 is before 
2. Such rearrangements are not allowed when you construct a subsequence. 
The following definition should now make sense: 

Definition A. 3. 7 Let be a sequence in M, and suppose that {nk)k is a strictly 

increasing sequence in N (i.e. ni < n2 < ns < . . .). Then the sequence 

is called a subsequence of {xn)n- 
For example 

{X2n)n 
{X3n—l)n 

are subsequences of {xn)n- 

Remarks A. 3. 8 1. One easy but useful fact to note is the following: If ni < n2 < ns < . . . 

is a strictly increasing sequence of natural numbers, then > k (for each /c G N). 

If you can't see this immediately, try proving it by induction. Clearly rii > 1. Now suppose that Uk > k. 

Then nk+i > Uk > k, and thus nk+i > fe + 1. 

2. Note that the n in {xn)n is a "dummy" variable — not really a variable at all. This means 
that it doesn't matter if we replace the n by some other symbol k: {xk)k is exactly the 
same as {xn)n- 

For example (^)fc = 1, i, • • • = (i)„. 

In particular, lim^Xk is exactly the same as lim„x„, sup^Xk the same as sup„a;„, etc. 



= X2,X4,X5, . . . 
= X2,X5,Xs, ■ ■ . 
= X5,X25,a;i25, • • • 



Convergence in M 337 



In the expression (x„)n, the variable n is a bound variable, constrained to take on all possible values in 
the set N. We have a similar situation when we deal with definite integrals: The expression x dx is a 
number, namely |, and not a variable, even though it seems to have a variable x occurring in it. However, 
that a; is a bound variable, constrained to take on all possible values between and 1. It doesn't matter if 
we replace thea; by some other symbol u: x dx is exactly the same as u du 

□ 

One important type of subsequence is a tail sequence. A tail sequence of is a subsequence 
which consists of aU terms of Xn from some A'" onwards, e.g. 5, 6, 7, . . . is a tail sequence of 
1, 2, 3, . . . . Similarly j^g, ... is a tail sequence of 1, |, |, j, . . . . Thus {yn)n is a tail 

sequence of iff there is an integer N >0 such that y„ = xjv+n- 

Example A.3.9 If 

{ — if n is odd 
n 
1 
1 H ;r if n is even 

then (xn}n is divergent. However, the sequences (yn)n, (zn)n defined by 
are convergent subsequences of (a;„)„, with y„ 0, and 1. 

If you think long enough, it should be clear that a subsequence (a:„j.)fc of {xn)n converges if and only if: 
EITHER the sequence {nk)k is odd eventually (in which case Xn^, — > 0), OR {nk)k is even eventually (in which 

C&SG Xfip, * 1^ . 

Similarly, if {nk)k is BOTH odd infinitely often and even infinitely often, then {xn^)k diverges. 

□ 

The following proposition claims two things: 

• If {xn)n is a convergent sequence, then every subsequence of {xn)n converges, and to 
the same limit. 

• If a tail sequence of {xn)n is convergent, then {xn)n is itself convergent, and to the same 
limit. 



Proposition A.3.10 (a) If Xn — ^ x, and if {yn)n is a subsequence of {xn)n, then yn ^ x 
as well. 

(h) If {yn)n is a tail sequence of {xn)n> o-nd if yn x, then also x„ — ^ x. 



Exercise A. 3. 11 Wc prove Propn. A.3.10: 

(a) Suppose that yk = Xn^ , where ni < n2 < ns < . . . . We must show that yk —> x, i.e. that for every e > 0, 
there is a it' e N such that \yk — x\ < s whenever k > K. 

So let e > be given. First explain why we can choose M e N such that \xm — x\ < e whenever m > M. 
Now use Remarks A. 3. 8 to explain why we may choose if e N such that rik > M whenever k > K. Finally, 
show that if A; > isT, we have 

\yk -x\ = \x„^ -x\<e 

and conclude. 



338 



limsup and liminf; Subsequences 



(b) Suppose that {yn)n is a convergent tail subsequence of {a;„)„, and that y„ x. We must show that also 
Xn X. So let e > 0. Explain why there is N such that n > N implies \yn — x\ < £. Next explain why 
there is a non-negative integer M is such that ?/„ = Xn+M- Conclude that 

\xn — x\ < e whenever n > N + M 

□ 



Example A. 3. 12 Consider again the sequence 




n 



if n is even 



The sequences 

- 1 _ JL 

2n + 1 4n^ 

are subsequences of (a;„)„. Since j/„ ^ and Zn —> 1, we can conclude that (x„)n is divergent. For if {xn)n 
converges (to x, say), then all its subsequences would also converge to the same limit x. But here we have two 
subsequences which converge to different limits. 

□ 



The next exercise is preparation for Propn. A. 3. 14: 

Exercise A. 3. 13 (a) Let a:„ = (— 1)" + ^. Find a decreasing subsequence which converges to limsup„ Xn- 
Is there an increasing subsequence which converges to limsup^ a;„? 

(b) Let x„ — (— 1)" — ^ . Find an increasing subsequence which converges to limsup„ x„. Is there a decreasing 
sequence which converges to limsup„a;„? 



□ 



Proposition A.3.14 Every sequence has a monotone subsequence. 

In fact every sequence has a monotone subsequence which converges to lim sup„ x„ ( and 
similarly, one which converges to liminf„ 



Proof: Given a sequence we show that there is a monotone subsequence which con- 

verges to HmsupjjXn. Let x = Hmsup„x„, and put y„ = sup{x^ : m > n} (so that y„ i ^). 

We distinguish two cases. 
Case 1: x < yn for all n. 

(We allow here the case x = — oo.) In that case, we can choose a decreasing subsequence 
{xnk)k inductively, as follows: Let Ni = 1. Since yjvj > x, there is ni > Ni so that Xm > x. 
Next, since yn i x, there is N2 > ni such that yj^^ < ^"i (i-^- Xm < for all m > N2). 
Since, by hypothesis y]\j.^ > x, there is 722 > such that .x„2 > x also. Thus x < x„2 < a^m- 
Keep going in the same way: Once we have constructed Ni < N2 < • • • < and 
ui <n2 < ■ ■ ■ <nk such that 

Njj^l > Uj > Nj X < Xnj+i < Xrij for j = 1, . . . , - 1 

we may choose N^-^-i > so that yNk+i < Xn,.- Since also yN^+i > x, there is n^+i > A^^+i 
such that Xrik+i > x. Since Uk+i > iV^+i, we see that Uk+i > n^. Since Xn^+i < UNk+u we 
have X < x^f,,^^ < a^n^ • 



Convergence in M 



339 



This completes the construction of the subsequence {xn^.)k- Note that {yNk)k is a subse- 
quence of {yn)n, so that i ^- Since clearly x < Xn^. < VNu fo^' all k, the Sandwich Theorem 
ensures that Xn^. i x as well. 
Case 2: There is Nq such that y^o = x. 

(We allow here the case x = +00.) In that case, since y„ is a decreasing sequence converging 
to X, we must have i/„ = x for all n > Nq also. In particular, it follows that .x„ < x for all 
n > Nq. Thus either (i) Xn < x eventually, or (ii) Xn = x infinitely often. If (ii) holds, there 
is obviously a constant (hence monotone) subsequence converging to x, so it remains to deal 
with (i). 

Suppose therefore that Xn < x for all n> Ni, and let N = maxjATo, Ni}. Then 

\/n> N {yn = X A Xji < x) 

Define t„ = x — ^ if x is finite, and put t„ = n if x is infinite. The point is that t„ | x, 
whether x is finite or not. Inductively construct an increasing subsequence x^^. as follows: 
Choose ni = N, so that x^ < x. Because yn = x for all n > N, there is 712 > ni 
such that max{x„j,t2} < Xn2 oi course, also x„2 < x. Proceed in the same way: Given 
ni < n2 < ■ ■ ■ < Uk such that 

max{xn^ , tj+i} < Xn^+i < X for j = 1, . . . , A; - 1 

choose Uk+i > Uk such that x^^^.^^ > max{x„j.,tjk+i} — this is possible because yuk+i — 
sup{x„ : n > nfc} = x. 

In this way we obtain a strictly increasing subsequence {xn,,)k such that tk < Xn^ < x. 
Since tk — ^ x, we see that x^^, — ^ x also, by the Sandwich theorem. 

H 

Here is our final characterization of lim sup and lim inf. 



Proposition A. 3. 15 Suppose that (x„) is a bounded sequence. Then limsup^x^ is the 
biggest number to which (xn) has a convergent subsequence, and lim inf„ Xn is the smallest 
number to which {xn)n has a convergent subsequence. 

Proof: By Proposition A. 3. 14, {xn)n does have a subsequence converging to limsup„x„. 

Now suppose that {xn^)k is a subsequence of (x„), and that x„^ — > a, and let x = 
linisup^Xn. If a > X, we may choose £ > such that a — e > x also (e.g. e = 2(0 + x) will 
do). Since Xn^ a, eventually x„^ > a — e, i.e. there is K such that x„j. > a — e whenever 
k> K. The set {xn,. '■ k > K} is therefore an infinite set of x^'s which are greater than a — e, 
and thus x„ > a — £ infinitely often. It follows that limsup„x„ > a — £ > x, contradicting 
lim sup„ Xn = X. Hence the assumption that a> x leads to contradiction. 

It follows that if there is a subsequence x^^ a, then a < limsup„x„, and hence 
lim sup„ Xn is the biggest number to which (xn) has a convergent subsequence. 

H 

The following important proposition is left as an exercise. 



340 



Cauchy Sequences and Completeness 



Proposition A.3.16 A bounded sequence {xn)n 


is convergent if and only z/lim sup„ Xn = 


lim inf „ Xn ■ 




In that case lim„Xn is equal to both limsup„a;„ 


and liminfn Xn- 


lim sup Xn = lim Xn 


= lim inf Xn 


n " 


n 



Exercise A. 3. 17 Prove Proposition A.3.16. 

[Hint: {=^): If {Xn)n converges, then every subsequence converges, and to the same limit. 

Suppose that limsup„ a;„ = hminf„ Xn = x, and let e > 0. Then Xn < x + e eventually, and Xn > x — e 
eventually.] 

□ 



Hidden away in the analysis of lim sup and lim inf is the following important theorem: 

Theorem A.3.18 (Bolzano-Weierstrass) Every bounded sequence of reals has a conver- 
gent subsequence. 

Proof: By Theorem A. 3. 14, any sequence (xn) has a monotone subsequence. If is 
bounded, then so is the subsequence. The result follows by Theorem A. 2. 4. 

H 

A.4 Cauchy Sequences and Completeness 

We have already seen that any bounded increasing sequence converges (Theorem A. 2. 4). This 
allowed us, in Example A. 2.6, to conclude that the sequence ((1 + ^)")n converges, though 
we could not see where it converges to. The Completeness Axiom guarantees the existence of 
a limit, even if we do not know what that limit is. 

Like a bounded increasing sequence, a Cauchy sequence is a sequence that "ought to" 
converge. And, as we shall see, a Cauchy sequence does converge: The existence of a limit is 
guaranteed by the Completeness Axiom, even if we do not know what that limit actually is. 

Intuitively, a sequence (.t„) „ in R is a Cauchy sequence if its terms lie eventually arbitrarily 
close to each other. This means that from some point onwards, any two terms are "close" . If 
all terms lie closer and closer together, there should be some point that they are all clustering 
around, and that point should be the limit of the sequence 

All this "ought"' and "should" ncxxls to be made precise. 

Definition A.4.1 A sequence {xn)n in ^ is called a Cauchy sequence if and only if 

sup{\xn — Xm\ '■ m,n > N} ^ as iV — oo 
i.e. iff for every £ > there is an AT G N such that 

\xn — Xm\ < £ whenever n,m> N 

i.e. \fe >03N en\fnenymen[n> N Am> N ^\xn-Xm\ <e] 

Note that all terms from some point onwards need to be within e of each other, not just 
successive terms. Thus, for example, if = 100, then |x3oi — a;i56734| < £• 



Convergence in M 



341 



Example A. 4. 2 The sequence (1 + (— 1)"2 ")„ is Cauchy. Indeed, given e > 0, we may choose N € N 
such that < |. li n, m > N , then (by the triangle incquahty) 

1(1 + (-1)"2-") - (1 + (-1)'"2-™)| < 2"" + 2-"" < 2-"^ +2-^^ <e 

□ 



Lemma A.4.3 // {xn)n is a convergent sequence in M, then it is a Cauchy sequence. 



Proof: Suppose that Xn x, and that we are given e > 0. We must find A'^ such that 
\xn — Xm\ < £ whenever n,m> N . 

Now because x„ — x there is A/" G N such that Ix^— a;| < | whenever n > iV. In particular, 
if n, m > then 

I I <• I II I e e 

\Xn Xm\ "S: \Xn X| + |x ^ ^ ~^ 2 

Hence {xn)n is a Cauchy sequence. 

H 

So any convergent sequence is a Cauchy sequence. And this is not surprising: If the terms 
of a sequence {xn)n are eventually close to some point x (the limit), then those terms must 
also eventually be close to each other. 

More importantly, the converse is true: Any Cauchy sequence in M is convergent. To 
prove this, we will need a number of lemmas. We shall prove: 

• Every Cauchy sequence is bounded. 

• Every bounded sequence has a convergent subsequence. 

• If a Cauchy sequence {xn)n has a convergent subsequence, then is itself convergent. 

Actually, the second point has already been proved. It is the Bolzano-Weierstrass theorem 
(Theorem A. 3. 18). Thus we need only prove the first and the last point. 

Lemma A.4.4 // {xn)n is a Cauchy sequence in M, then {xn)n is bounded. 

Proof: Choose N gN such that — x^l < 1 whenever n,m > N. (This is possible, because 
{xn)n is Cauchy — we have taken £ = 1.) Now define 

K = max{|xi|, |a;2|, . . . , |a:jv| + 1} 

We show that is a bound for (x„)„, i.e. that \xn\ < K for all n G N. 

Consider separately the two case (i) n < N, and (ii) n > N. In case (i), we obviously 
have \xn\ < K, by definition of K. Suppose therefore, that n > N. In that case, both n and 
N axe>N N, and thus 

\xn\ < \xn — xn\ + |a;iv| < 1 + \xn\ < K 

which finishes case (ii). 



342 



Cauchy Sequences and Completeness 



Lemma A.4.5 If {xn)n is a Cauchy sequence, and if {xn)n has a convergent subsequence, 
then {xn)n itself converges. 

Proof: Suppose that {xnf.)k is a subsequence of the Cauchy sequence (x„)„, and that — x 
(as k oo). We show that Xn ^ x (as n oo). 

So let £ > 0. We must show that there is iV G N such that \xn — x\ < e whenever n > N. 
Now because {xn)n is a Cauchy sequence, we can find an Ni such that 

n,m > Ni impHes \xn — Xm\ < ^ 

Because Xn,, ^ x, we can find a K such that 

£ 

k > K imphes |a;„^ ~ ^1 < 2 

Now define N = max{A'"i, n;^}, and let n > N. Choose k > K large enough so that also 
Uk > N. Then (i) n,nk > Ni, and (ii) k > K. It follows that 

I 1^1 II . e e 

whenever n > N. 

H 

Theorem A.4.6 Let {xn)n be a sequence in M. Then {xn)n converges if and only if it is 
a Cauchy sequence. 

Proof: (^) is Lemma A.4.3. 

(<^): If {xn)n is a Cauchy sequence, then it is bounded (by Lemma A.4.4). Hence it has a 
convergent subsequence (by Theorem A. 3. 18). It follows that {xn)n converges (by Lemma 
A.4.5). 

H 

The fact that Cauchy sequence converge in M is depends very much on the Completeness 
Axiom. If you look back over the proof of Theorem A.4.6, you will not see the Completeness 
Axiom mentioned explicitly. But we do use the Bolzano-Weierstrass Theorem, whose proof 
requires the Completeness Axiom. This is made clear by the following exercise: 

Exercise A. 4. 7 We define temporarily the following notions for subsets A C R. We say that 

• j4 C R is a complete set if and only if whenever B C ^4 is bounded above (below) then sup B G A 
(infS € A). 

• We say that A C Ria a Cauchy set if and only if whenever (a„)n is a Cauchy sequence in A, then (an)™ 
converges to an element o in A. 

(1.) Which of the following sets are complete; which are Cauchy? 

[0,1], (0,1), R, Q, {i:neN}, {0}U{i:n€N} 

(2.) We prove that A C R is a Cauchy set iff it is a complete set. 

(a) Suppose that A C R is a complete set, and let (an,)„ be a Cauchy sequence in A. Explain why (a„)„ 
has a monotone subsequence {ank)k- Explain why lim^ Uni, £ A. Conclude that lim^ On £ A. 



Convergence in M 



343 



(b) Suppose that A C R is a Cauchy set, and that B Q A is bounded above. Explain why there is a 
sequence (a„)n in A such that Un supB. Conclude that sup-B £ A. 

□ 

Exercises A. 4. 8 l. (a) Prove that if (xn) converges, then lim„(a;n+i — Xn) = 0. 

(b) Does the converse hold? i.e., docs lim,i(.T„ — .t,i+i) — imply that (xn) converges? 
2. (a) Suppose that a sequence (xn) has \xn+i — .x'n| < 2^" for all n G N. Show that {x„} converges, 
(b) Does the same hold if we only know that \xn — Xn+i\ < ^ for all n € N? 



□ 



344 Cauchy Sequences and Completeness 



Appendix B 

Sets and Logic 



B.l Logic, Formal Languages, Quantifiers 

A formal language is a collection of C whose logical symbols include 
• Logical Connectives 



A 


and 


V 


or 




implies 


<— > 


if and only if 


— 1 


not 



It is enough to use just two connectives, e.g. A and -i. We can then define the remainder by 

(fV tp = -'{-"p A -itp) 

(fi = -nip V 1p 

ifi <^ tp = {(f ^ ip) A {tp ip) 
Just a reminder: V is inclusive-or. pV q is true if and only if at least one of p, q is true, possibly both. 

• Quantifiers 



For all 
There exists 



We have 

Vxif = -i3x{-iip) 3xip = -Nx{-np) 

• Variables 

• Identity relation 

A special binary relation symbol denoted =. 

Logical symbols have the same meaning, regardless of context. C also has non-logical 
symbols, whose meaning depends on context: 



345 



346 



Logic, Formal Languages, Quantifiers 



• Relation symbols 

For example, if we want to talk about partial orderings, we will want a symbol <; if we 
want to talk about sets, we will want symbols G and C. 

• Function symbols 

For example, if we want to talk about arithmetic, we will want binary function symbols 
+, X. We may want unary function symbols — If we want to talk about sets, we 
will want binary function symbols n,U, unary function symbols 

• Constant symbols 

These arc specially named elements, and are often regarded as nullary function symbols. 
For example, if we want to talk about addition, a distinguished element denoted by 
plays an important role. If we want to talk about sets, the set deserves its own name. 

A formal language will generally not contain all of the above non-logical symbols, only those 
needed to talk about the domain of discourse. C will also have brackets (,),[,], etc. 

The symbols of a formal language may be "strung" together to form two types: terms 
and formulas. 

• Terms are defined as follows: 

(i) Every variable and every constant is a term; 

(ii) If ti, . . . , t„ are terms, and if F is an n-ary function symbol, then F{ti, . . . ,tn) is 
a term; 

(iii) A string is a term only if it can be shown to be so by a finite number of applications 
of (i) and (ii); 

• Formulas are defined as follows: 

(i) If ti, . . . , tn are terms, and if R is an n-ary relation symbol, then R{ti, . . . , t„) is a 
formula. (This includes the case where R is the logical binary relation symbol =) . 

(ii) If (f, ip are formulas, then so are {(p Atp), {(p V ip) , {ip ^ ijj) , {ip ip); 

(iii) If is a formula, then so is -tip; 

(iv) If (/? is a formula and a; is a variable, then ^xip and 3xip are formulas; 

(v) A string is a formula only if it can be shown to be so by a finite number of 
applications of (i)-(iv). 

We often omit brackets when there is no danger of confusion. Moreover, we may also abbre- 
viate MxMiup by Vx, yip. 

If is a formula, we write p{x, y, z) to show that the variables of (p are (amongst) x, y, z. 

Example B.1.1 Partial orderings 

Consider the following language C: In addition to the logical symbols, C has a single binary relation symbol 
<. There are no function and constant symbols. Thus the only terms of C are the variables. Some example of 
formulas are 

X < y, Vx{x < y Ay < z) —>■ 3z{-i{z < x)) 
The theory of partial orderings has the following axioms 

(i) yx{x < x); 

(ii) yx,y{x <y Ay < X ^ x = y); 



Appendix 



347 



(iii) Va;, y, z(x <y/\y<z—>x<z). 

This theory has many interpretations. One is the two-element chain C2 = {0, 1} with < 1. This is a linear 
ordering, i.e. it satisfies the axiom Vx,y{x < yV y < x). Another example is the powerset V{A) of a set A, 
where < is interpreted as "subset". This ordering is non-linear if A has more than one element. 
Thus different structures may satisfy the same axioms. 

□ 

Finally, a note about negating quantifiers: A negation sign can "creep" past a quantifier, 
but it flips the quantifier in the process: 

-Nxip = 3x{-iip) -iJxip = Vx(-i(^) 

For example, 

^[Va;3y(y > x)] = > x)] 

= ^xiy{y i> x) 

B.2 Basic Set Theory 
B.2.1 Sets 

In the early twentieth century, the following principle was established: 
All mathematical objects are sets. 

All mathematical notions can be expressed as relationships between sets. 

Intuitively, a set is just a collection of objects. 



If A is a set and x is some mathematical object, then we say 

x G A (x is an element of A) 

if X is amongst the objects that arc collected in A. 

A set is characterized entirely by its elements. Two sets are the same if and only if they 
have the same elements: 

A = B <s=^ Va;[x e A ^ x e B] 

Instead of set, we will also say collection, family or class. Instead of saying x is an element 
of A, we may also say x is a member of A 01 x belongs to A. 

We say that a set ^ is a subset of a set B if and only if every element of A belongs to B 

ACB yx[x eA^xeB] 

Thus 

A = B iff {ACB)A{BCA) 

We say that ^ is a proper subset of B if A C B, but A ^ B. We also write S D A to mean 
ACB. 

There are two ways to represent a set: 
(i) By listing its elements: A = {oj : i G 1} 



348 



Basic Set Theory 



(ii) By some defining property: A = {x : (j>{x)} 

The following sets have special symbols associated with them: 

• The empty set $ = {} = {x : x ^ x}. 

• N := {1, 2, 3, . . . } is the set of natural numbers. 

• Z+ := {0, 1, 2, 3, } is the set of non-negative integers. 

• Z :={—... , —2, —1,0, 1, 2, . . . } is the set of integers. 

• Q := {| : P £ ^, 9 £ N} is the set of rational numbers. 

• M is the set of real numbers. 

• C is the set of complex numbers. 

B.2.2 Union and intersection 

The symbols U, fl denote, respectively, the union and intersection of two sets: 



The symbols |J and P| denote, respectively, the union and intersection of a family of sets: 
If ^ = {Ai : i G /} is a family of sets, then 



AU B = {x : X e A\/ x e B} 
Ar\B = {x: xeAAxeB} 



(jA=[jA., 



{x -.Ji e I : X e Ai} 



{x i^i e I : X e Ai} 



Thus 




<^ 3Ae A[x e A] 
^ G A[x G A] 



X 



Note that 



AU B instead of 




etc. 



We may also write 



oo 



IJ ^„ = U ^2 U ^3 U . . . 



n=l 



for UneN ^n, etc. 



Appendix 



349 



B.2.3 Set difference, complementation and symmetric difference 



If A, B are sets, we define the set difference of A and B by 



A-B = {x: xeA^x^B} 



We define the symmetric difference of A, B by 



AAS = {A - B)^ {B - A) = {A^ B) - {A{^ B) 



Often, we will be working with subsets of some universal set Vt. If A C $7, we define the 
complement of A by 



A'' = n-A 



Note that if ^, S C then 



A - B = A^^B' 



B.2.4 Set algebra 

Note the following laws: 



350 



Basic Set Theory 



• 


Idempotent laws: 

AUA = A 


AnA = A 


• 


Commutative laws: 






AU B = BU A AnB 


= BnA AAB = BAA 


• 


Associative laws: 






AU{BUC) = {AUB)UC An{BnC) = 


= {Ar\B)r\C AA{BAC) = {AAB)AC 


• 


Distributive laws: 






An(Buc) = (AnB)u(AnC) 


AU(BnC) = (Au B)n(AuC) 




iei iei 


/i u 1 1 ±Ji — 1 U ±>ij 

iei iei 




(AAB) nc={An C)A{B n C) 


{AAB) U C = (A U C)A{B U C) 


• 


Absorption laws: 

An{AuB) = A 


Ay^{A^B) = A 


• 


Complementation laws: 






A n yl^ = AUA" 


= (the universal set) 




{A^y = A {AABy = A^AB 


• 


De Morgan's laws: 






{A n BY = A''UB^ 


{A U BY = A^r]B'' 




(u-4.)°=n-*f 

iei iei 


iei iei 


• 


Set difference laws 






A-{BUC) = {A-B)n{A-C) 


A-{BnC) = {A-B)UA-C) 




A-\jBi = f^{A-Bi) 
iei iei 


A-f]Bi = \J{A-Bi) 
iei iei 


• 


Symmetric difference laws: 






AAA = AA0 


= A AAn = A" 



Note also that the trivial fact that AAB C. AU B implies the following useful triangle 
inequality: 

AAC C (AAB) U {BAG) 



Appendix 



351 



Indeed, we have 

AAC = {AA$)AB = AA{BAB)AC = {AAB)A{BAC) C (AAB) U (BAG) 
B.2.5 Products 

Since a set is determined completely by its elements, wc sec that {a, b} = {b, a} — the order 
of a, b doesn't matter. It is often convenient to have a structure in which order does matter, 
however. An ordered pair (a, 6) should be thought of as a collection containing a and b, in 
that order. Thus (a, 6) ^ {b,a). 

Typically, the an ordered pair (o, b) is defined as follows: 

{a,b) = {{a},{a,b}} 

This is done purely to maintain the principle that all mathematical objects must be sets. The exact definition 
of an ordered pair doesn't matter, however. What is important is that 

(a, b) = (c, d) ■^=^ a = cAb = d 

We will agree to let (a, (6, c)) and ((a, b),c) denote the same object: In each case, we have 
a followed by b followed by c. It is convenient to omit the inner pair of brackets, and to denote 
this object by the ordered triple (a, b, c). 

In the same way, an ordered n-tuple (oi, 02, . . . , On) should be thought of as a collection 
containing ai, 02, • • . , a„, in that order. We may even consider an /-indexed ordered tuple 
{aiji^j: This is just like the /-indexed set {oj : z G /}, except that the order matters. Note 
that if the index set / is the set of natural numbers, then (a„)„gN is just a sequence: 

(an)neN = Ol, 02, 03; • • • 

Given two set A, B, we define their cartesian product by 

Ax B = {{a, b) -.ae AAbe B} 
In the same way, we can define the cartesian product of a family of sets: 

JJ ylj = {{ai)i^i : Qi e Ai for alH € /} 

iei 

Note that an element of Yliei thought of as a function: The ordered tuple (ai)ig/ corresponds 

to that function f : I Ujgj -^i with the property that 

f{i) = ai 

Such an / is called a choice function. Thus Yli is precisely the set of all choice functions I — » Uie/ 
We also define powers of sets as follows: 

A'^ := Ai where Ai := A for aU i G / 
iei 

Note that an element of A' can be thought of as a function: The ordered tuple (ai)i(=i corresponds to that 
function f : I A with the property that 

f{i) = Oi 

Thus a' is precisely the set of all functions from I to A. 



352 



The Extended Real Number System 



B.3 The Extended Real Number System 

The extended real line 

1 = [-00, 00] = M U {00, -00} 
This can be topologized as follows: The family of all sets of the forms 

[—00, a) {a,b) {b,oo] a, 6 G M 

forms a base for the topology. 

We can also define partial arithmetic operations, as follows: +, • agree with ordinary 
addition and multiplication when applied to real numbers. Also 

a lb 0000 ■ a = 00 + a = ±00 a G M 
' +00 if a > 



a • 00 = < 



if a = 
-00 if a < 



Analogous results hold for multiplication with —00. 
Similarly 

00 + 00 = 00-00 = 00 oo - —00 = —00 ■ 00 = —00 

The expression 

00 — 00 

is left undefined. 

It is straightforward to check that the commutative, associative, and distributive laws 
hold in M whenever both sides of the identity under consideration are defined. 
Note that we have defined • 00 = 0. 



